2025-09-07T06:12:28.7250053Z Current runner version: '2.328.0' 2025-09-07T06:12:28.7257372Z Runner name: 'i-09e93d2ae04f2bbfa' 2025-09-07T06:12:28.7258297Z Runner group name: 'default' 2025-09-07T06:12:28.7259274Z Machine name: 'ip-10-0-64-174' 2025-09-07T06:12:28.7262538Z ##[group]GITHUB_TOKEN Permissions 2025-09-07T06:12:28.7265048Z Contents: read 2025-09-07T06:12:28.7265855Z Metadata: read 2025-09-07T06:12:28.7266566Z Packages: read 2025-09-07T06:12:28.7267136Z ##[endgroup] 2025-09-07T06:12:28.7269415Z Secret source: Actions 2025-09-07T06:12:28.7270855Z Prepare workflow directory 2025-09-07T06:12:28.7842380Z Prepare all required actions 2025-09-07T06:12:28.7884982Z Getting action download info 2025-09-07T06:12:29.0668068Z Download action repository 'pytorch/test-infra@main' (SHA:548a4bc624d43a01cdf165a63b041f0ae014ddbd) 2025-09-07T06:12:30.4198829Z Download action repository 'pytorch/pytorch@main' (SHA:93fb23d6fae7c4e82c4239a1033e522088742634) 2025-09-07T06:12:44.7627067Z Download action repository 'actions/upload-artifact@50769540e7f4bd5e21e526ee35c689e35e0d6874' (SHA:50769540e7f4bd5e21e526ee35c689e35e0d6874) 2025-09-07T06:12:45.1607768Z Getting action download info 2025-09-07T06:12:45.2715705Z Download action repository 'actions/checkout@v4' (SHA:08eba0b27e820071cde6df949e0beb9ba4906955) 2025-09-07T06:12:45.5587256Z Complete job name: Build cu128 vLLM wheel 2025-09-07T06:12:45.6256318Z A job started hook has been configured by the self-hosted runner administrator 2025-09-07T06:12:45.6369835Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh' 2025-09-07T06:12:45.6380134Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:12:45.6381020Z ##[endgroup] 2025-09-07T06:12:46.9728115Z Runner Type: linux.12xlarge.memory 2025-09-07T06:12:46.9728689Z Instance Type: r5.12xlarge 2025-09-07T06:12:46.9728979Z AMI Name: unknown 2025-09-07T06:12:46.9759624Z AMI ID: ami-05ffe3c48a9991133 2025-09-07T06:12:52.6510303Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main 2025-09-07T06:12:52.6510792Z with: 2025-09-07T06:12:52.6511411Z github-secret: *** 2025-09-07T06:12:52.6511691Z activate-with-label: false 2025-09-07T06:12:52.6511984Z label: with-ssh 2025-09-07T06:12:52.6512224Z remove-existing-keys: true 2025-09-07T06:12:52.6512508Z fail-silently: true 2025-09-07T06:12:52.6512747Z env: 2025-09-07T06:12:52.6512952Z PY_VERS: 3.12 2025-09-07T06:12:52.6513254Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:12:52.6513651Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:12:52.6513937Z BUILD_DEVICE: cu128 2025-09-07T06:12:52.6514184Z ##[endgroup] 2025-09-07T06:12:52.7779980Z Please see https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-for-Github-Actions for more info. 2025-09-07T06:12:52.7781830Z Not on pull request and ciflow reference could not be extracted, skipping adding ssh keys 2025-09-07T06:12:52.7980133Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main 2025-09-07T06:12:52.7980658Z with: 2025-09-07T06:12:52.7980944Z submodules: false 2025-09-07T06:12:52.7981249Z fetch-depth: 0 2025-09-07T06:12:52.7981499Z env: 2025-09-07T06:12:52.7981730Z PY_VERS: 3.12 2025-09-07T06:12:52.7982075Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:12:52.7982523Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:12:52.7982842Z BUILD_DEVICE: cu128 2025-09-07T06:12:52.7983112Z ##[endgroup] 2025-09-07T06:12:52.8076525Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T06:12:52.8077532Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T06:12:52.8087617Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:12:52.8088034Z env: 2025-09-07T06:12:52.8088308Z PY_VERS: 3.12 2025-09-07T06:12:52.8088621Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:12:52.8089027Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:12:52.8089525Z BUILD_DEVICE: cu128 2025-09-07T06:12:52.8089762Z ##[endgroup] 2025-09-07T06:12:52.8201008Z ##[group]Run # Use all available CPUs for fetching 2025-09-07T06:12:52.8201469Z # Use all available CPUs for fetching 2025-09-07T06:12:52.8201812Z cd "${GITHUB_WORKSPACE}" 2025-09-07T06:12:52.8202162Z git config --global fetch.parallel 0 2025-09-07T06:12:52.8202564Z git config --global submodule.fetchJobs 0 2025-09-07T06:12:52.8202905Z  2025-09-07T06:12:52.8203292Z # Clean workspace. The default checkout action should also do this, but 2025-09-07T06:12:52.8203770Z # do it here as well just in case 2025-09-07T06:12:52.8204099Z if [[ -d .git ]]; then 2025-09-07T06:12:52.8204389Z  if [ -z "${NO_SUDO}" ]; then 2025-09-07T06:12:52.8204722Z  sudo git clean -ffdx 2025-09-07T06:12:52.8205000Z  else 2025-09-07T06:12:52.8205264Z  git clean -ffdx 2025-09-07T06:12:52.8205559Z  fi 2025-09-07T06:12:52.8205950Z fi 2025-09-07T06:12:52.8211648Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:12:52.8212145Z env: 2025-09-07T06:12:52.8212484Z PY_VERS: 3.12 2025-09-07T06:12:52.8212981Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:12:52.8213742Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:12:52.8214157Z BUILD_DEVICE: cu128 2025-09-07T06:12:52.8214534Z NO_SUDO: 2025-09-07T06:12:52.8214816Z ##[endgroup] 2025-09-07T06:12:52.8438275Z ##[group]Run actions/checkout@v4 2025-09-07T06:12:52.8438677Z with: 2025-09-07T06:12:52.8439065Z ref: 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:12:52.8439489Z fetch-depth: 0 2025-09-07T06:12:52.8439864Z submodules: false 2025-09-07T06:12:52.8440218Z show-progress: false 2025-09-07T06:12:52.8440616Z repository: pytorch/pytorch 2025-09-07T06:12:52.8441217Z token: *** 2025-09-07T06:12:52.8441531Z ssh-strict: true 2025-09-07T06:12:52.8441890Z ssh-user: git 2025-09-07T06:12:52.8442198Z persist-credentials: true 2025-09-07T06:12:52.8442659Z clean: true 2025-09-07T06:12:52.8442978Z sparse-checkout-cone-mode: true 2025-09-07T06:12:52.8443380Z fetch-tags: false 2025-09-07T06:12:52.8443739Z lfs: false 2025-09-07T06:12:52.8444099Z set-safe-directory: true 2025-09-07T06:12:52.8444457Z env: 2025-09-07T06:12:52.8444801Z PY_VERS: 3.12 2025-09-07T06:12:52.8445229Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:12:52.8445703Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:12:52.8446167Z BUILD_DEVICE: cu128 2025-09-07T06:12:52.8446479Z ##[endgroup] 2025-09-07T06:12:52.9644169Z Syncing repository: pytorch/pytorch 2025-09-07T06:12:52.9646014Z ##[group]Getting Git version info 2025-09-07T06:12:52.9693540Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2025-09-07T06:12:52.9694501Z [command]/usr/bin/git version 2025-09-07T06:12:52.9694826Z git version 2.47.1 2025-09-07T06:12:52.9696045Z ##[endgroup] 2025-09-07T06:12:52.9700468Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/23b40598-81c5-429a-a237-9bca3db37dee/.gitconfig' 2025-09-07T06:12:52.9701836Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/23b40598-81c5-429a-a237-9bca3db37dee' before making global git config changes 2025-09-07T06:12:52.9702963Z Adding repository directory to the temporary git global config as a safe directory 2025-09-07T06:12:52.9703893Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-09-07T06:12:52.9714196Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2025-09-07T06:12:52.9717837Z ##[group]Initializing the repository 2025-09-07T06:12:52.9722521Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-09-07T06:12:52.9760211Z hint: Using 'master' as the name for the initial branch. This default branch name 2025-09-07T06:12:52.9760988Z hint: is subject to change. To configure the initial branch name to use in all 2025-09-07T06:12:52.9761878Z hint: of your new repositories, which will suppress this warning, call: 2025-09-07T06:12:52.9762348Z hint: 2025-09-07T06:12:52.9762711Z hint: git config --global init.defaultBranch 2025-09-07T06:12:52.9763092Z hint: 2025-09-07T06:12:52.9763500Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2025-09-07T06:12:52.9764169Z hint: 'development'. The just-created branch can be renamed via this command: 2025-09-07T06:12:52.9764662Z hint: 2025-09-07T06:12:52.9764910Z hint: git branch -m 2025-09-07T06:12:52.9765517Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/ 2025-09-07T06:12:52.9769113Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch 2025-09-07T06:12:52.9797099Z ##[endgroup] 2025-09-07T06:12:52.9797606Z ##[group]Disabling automatic garbage collection 2025-09-07T06:12:52.9801021Z [command]/usr/bin/git config --local gc.auto 0 2025-09-07T06:12:52.9829584Z ##[endgroup] 2025-09-07T06:12:52.9830033Z ##[group]Setting up auth 2025-09-07T06:12:52.9834667Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-09-07T06:12:52.9862431Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-09-07T06:12:53.0195376Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-09-07T06:12:53.0221326Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-09-07T06:12:53.0525998Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-09-07T06:12:53.0570216Z ##[endgroup] 2025-09-07T06:12:53.0570734Z ##[group]Fetching the repository 2025-09-07T06:12:53.0576999Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-09-07T06:13:36.7005316Z From https://github.com/pytorch/pytorch 2025-09-07T06:13:36.7005868Z * [new branch] 160583 -> origin/160583 2025-09-07T06:13:36.7006569Z * [new branch] 2.6.0.dev20241004+ -> origin/2.6.0.dev20241004+ 2025-09-07T06:13:36.7007137Z * [new branch] 5addvllmbuild -> origin/5addvllmbuild 2025-09-07T06:13:36.7007834Z * [new branch] AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest 2025-09-07T06:13:36.7008942Z * [new branch] HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes 2025-09-07T06:13:36.7009633Z * [new branch] ISSUE-154849 -> origin/ISSUE-154849 2025-09-07T06:13:36.7010702Z * [new branch] JackCaoG/dynamo_make_fx_non_core_aten_ops -> origin/JackCaoG/dynamo_make_fx_non_core_aten_ops 2025-09-07T06:13:36.7012480Z * [new branch] NicoshevSVE128 -> origin/NicoshevSVE128 2025-09-07T06:13:36.7014081Z * [new branch] PR-AOTInductorNoneBug -> origin/PR-AOTInductorNoneBug 2025-09-07T06:13:36.7015468Z * [new branch] PR-AOTInductorNoneBugFix -> origin/PR-AOTInductorNoneBugFix 2025-09-07T06:13:36.7017299Z * [new branch] PR-FixConfigsIssue -> origin/PR-FixConfigsIssue 2025-09-07T06:13:36.7018398Z * [new branch] PR-NoneBugFix-viable -> origin/PR-NoneBugFix-viable 2025-09-07T06:13:36.7019626Z * [new branch] PR-ResetToZero -> origin/PR-ResetToZero 2025-09-07T06:13:36.7020947Z * [new branch] Update-Flash-Packaging -> origin/Update-Flash-Packaging 2025-09-07T06:13:36.7022186Z * [new branch] VLA_exp -> origin/VLA_exp 2025-09-07T06:13:36.7023563Z * [new branch] actually-run-mps-aot-inductor -> origin/actually-run-mps-aot-inductor 2025-09-07T06:13:36.7024735Z * [new branch] add-missing-args-normalization -> origin/add-missing-args-normalization 2025-09-07T06:13:36.7026111Z * [new branch] add-user-guide-structure -> origin/add-user-guide-structure 2025-09-07T06:13:36.7027370Z * [new branch] add-vllm-nightly-build -> origin/add-vllm-nightly-build 2025-09-07T06:13:36.7028516Z * [new branch] add_compile_benchmarking -> origin/add_compile_benchmarking 2025-09-07T06:13:36.7029713Z * [new branch] addmm-heuristic -> origin/addmm-heuristic 2025-09-07T06:13:36.7030894Z * [new branch] addsimde -> origin/addsimde 2025-09-07T06:13:36.7032247Z * [new branch] addvllmtest -> origin/addvllmtest 2025-09-07T06:13:36.7033888Z * [new branch] adi/acl_upgrade -> origin/adi/acl_upgrade 2025-09-07T06:13:36.7034982Z * [new branch] adi/test -> origin/adi/test 2025-09-07T06:13:36.7036252Z * [new branch] adi/test_bgemm -> origin/adi/test_bgemm 2025-09-07T06:13:36.7037337Z * [new branch] adi/test_fusions -> origin/adi/test_fusions 2025-09-07T06:13:36.7038486Z * [new branch] adi/test_onednn_v3.9 -> origin/adi/test_onednn_v3.9 2025-09-07T06:13:36.7039935Z * [new branch] adi/test_presve_change -> origin/adi/test_presve_change 2025-09-07T06:13:36.7040822Z * [new branch] adi/test_timm -> origin/adi/test_timm 2025-09-07T06:13:36.7042384Z * [new branch] adi/testpresve_change -> origin/adi/testpresve_change 2025-09-07T06:13:36.7044517Z * [new branch] aditew01/test/vec_bf16 -> origin/aditew01/test/vec_bf16 2025-09-07T06:13:36.7045730Z * [new branch] ah-globalfeedback-hook -> origin/ah-globalfeedback-hook 2025-09-07T06:13:36.7047886Z * [new branch] alt-disable -> origin/alt-disable 2025-09-07T06:13:36.7048768Z * [new branch] angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files 2025-09-07T06:13:36.7049854Z * [new branch] angelayi/aoti_inductor_fx -> origin/angelayi/aoti_inductor_fx 2025-09-07T06:13:36.7050998Z * [new branch] angelayi/benchmark -> origin/angelayi/benchmark 2025-09-07T06:13:36.7052217Z * [new branch] angelayi/benchmark2 -> origin/angelayi/benchmark2 2025-09-07T06:13:36.7053834Z * [new branch] angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization 2025-09-07T06:13:36.7055038Z * [new branch] angelayi/cpp_loader -> origin/angelayi/cpp_loader 2025-09-07T06:13:36.7056503Z * [new branch] angelayi/custom_op_subgraph -> origin/angelayi/custom_op_subgraph 2025-09-07T06:13:36.7057872Z * [new branch] angelayi/customop -> origin/angelayi/customop 2025-09-07T06:13:36.7059465Z * [new branch] angelayi/fake_cache_empty -> origin/angelayi/fake_cache_empty 2025-09-07T06:13:36.7060671Z * [new branch] angelayi/is_symbolic_tracing -> origin/angelayi/is_symbolic_tracing 2025-09-07T06:13:36.7061792Z * [new branch] angelayi/item -> origin/angelayi/item 2025-09-07T06:13:36.7063138Z * [new branch] angelayi/no_so_weight -> origin/angelayi/no_so_weight 2025-09-07T06:13:36.7064251Z * [new branch] angelayi/opoverload -> origin/angelayi/opoverload 2025-09-07T06:13:36.7065603Z * [new branch] angelayi/pattern -> origin/angelayi/pattern 2025-09-07T06:13:36.7066795Z * [new branch] angelayi/pytree -> origin/angelayi/pytree 2025-09-07T06:13:36.7068020Z * [new branch] angelayi/scan_layers -> origin/angelayi/scan_layers 2025-09-07T06:13:36.7069163Z * [new branch] angelayi/symint_input -> origin/angelayi/symint_input 2025-09-07T06:13:36.7070342Z * [new branch] angelayi/test_cpp -> origin/angelayi/test_cpp 2025-09-07T06:13:36.7071479Z * [new branch] angelayi/torch_size -> origin/angelayi/torch_size 2025-09-07T06:13:36.7072699Z * [new branch] aoti-cuda-alloc -> origin/aoti-cuda-alloc 2025-09-07T06:13:36.7073894Z * [new branch] aoti_target_windows -> origin/aoti_target_windows 2025-09-07T06:13:36.7075108Z * [new branch] aoti_weight_sharing -> origin/aoti_weight_sharing 2025-09-07T06:13:36.7076403Z * [new branch] atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124 2025-09-07T06:13:36.7077596Z * [new branch] atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1 2025-09-07T06:13:36.7078728Z * [new branch] atalman-patch-1 -> origin/atalman-patch-1 2025-09-07T06:13:36.7080057Z * [new branch] atalman-patch-3 -> origin/atalman-patch-3 2025-09-07T06:13:36.7081186Z * [new branch] atalman-patch-4 -> origin/atalman-patch-4 2025-09-07T06:13:36.7082738Z * [new branch] atalman-patch-5 -> origin/atalman-patch-5 2025-09-07T06:13:36.7083874Z * [new branch] atalman-patch-6 -> origin/atalman-patch-6 2025-09-07T06:13:36.7085239Z * [new branch] atalman_inductor_2.3.0 -> origin/atalman_inductor_2.3.0 2025-09-07T06:13:36.7086332Z * [new branch] atalman_inductor_2.3.1 -> origin/atalman_inductor_2.3.1 2025-09-07T06:13:36.7087487Z * [new branch] atalman_inductor_2.4.0 -> origin/atalman_inductor_2.4.0 2025-09-07T06:13:36.7088723Z * [new branch] atalman_inductor_2.4.x -> origin/atalman_inductor_2.4.x 2025-09-07T06:13:36.7090070Z * [new branch] autoupdate-transformers-pin-via-pr -> origin/autoupdate-transformers-pin-via-pr 2025-09-07T06:13:36.7091460Z * [new branch] bahuang/dtensor_demo -> origin/bahuang/dtensor_demo 2025-09-07T06:13:36.7093268Z * [new branch] bahuang/test -> origin/bahuang/test 2025-09-07T06:13:36.7095456Z * [new branch] base/1.5 -> origin/base/1.5 2025-09-07T06:13:36.7096551Z * [new branch] batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention 2025-09-07T06:13:36.7097684Z * [new branch] bc-lint-config -> origin/bc-lint-config 2025-09-07T06:13:36.7098868Z * [new branch] bc-lint-test-new-config -> origin/bc-lint-test-new-config 2025-09-07T06:13:36.7100493Z * [new branch] benchmark-updates -> origin/benchmark-updates 2025-09-07T06:13:36.7101499Z * [new branch] benchmarker_compat_with_do_bench -> origin/benchmarker_compat_with_do_bench 2025-09-07T06:13:36.7102644Z * [new branch] benchmarking-script -> origin/benchmarking-script 2025-09-07T06:13:36.7104510Z * [new branch] bertmaher/pinbump26 -> origin/bertmaher/pinbump26 2025-09-07T06:13:36.7106111Z * [new branch] bertrand/cutlass -> origin/bertrand/cutlass 2025-09-07T06:13:36.7107663Z * [new branch] bf/cg-custom-wrapper -> origin/bf/cg-custom-wrapper 2025-09-07T06:13:36.7108701Z * [new branch] bf/cg-or-error -> origin/bf/cg-or-error 2025-09-07T06:13:36.7109846Z * [new branch] bf/cg-remove-check -> origin/bf/cg-remove-check 2025-09-07T06:13:36.7110905Z * [new branch] bf/cg-skip-1-kernel -> origin/bf/cg-skip-1-kernel 2025-09-07T06:13:36.7112016Z * [new branch] bf/cudagraph -> origin/bf/cudagraph 2025-09-07T06:13:36.7113658Z * [new branch] bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation 2025-09-07T06:13:36.7115370Z * [new branch] bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark 2025-09-07T06:13:36.7116382Z * [new branch] bf/cudagraph-partition -> origin/bf/cudagraph-partition 2025-09-07T06:13:36.7117503Z * [new branch] bf/default-recompile-reason -> origin/bf/default-recompile-reason 2025-09-07T06:13:36.7118663Z * [new branch] bf/donated-buffer-bench -> origin/bf/donated-buffer-bench 2025-09-07T06:13:36.7119744Z * [new branch] bf/exp -> origin/bf/exp 2025-09-07T06:13:36.7120880Z * [new branch] bf/pa-non-divisible -> origin/bf/pa-non-divisible 2025-09-07T06:13:36.7122113Z * [new branch] bf/partition-move-cpu -> origin/bf/partition-move-cpu 2025-09-07T06:13:36.7123311Z * [new branch] bf/partition-turn-on -> origin/bf/partition-turn-on 2025-09-07T06:13:36.7124437Z * [new branch] bf/remove-check-55b0c39d -> origin/bf/remove-check-55b0c39d 2025-09-07T06:13:36.7125520Z * [new branch] bf/rope -> origin/bf/rope 2025-09-07T06:13:36.7126765Z * [new branch] bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492 2025-09-07T06:13:36.7127874Z * [new branch] bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb 2025-09-07T06:13:36.7128984Z * [new branch] bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129 2025-09-07T06:13:36.7130102Z * [new branch] bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d 2025-09-07T06:13:36.7131170Z * [new branch] bisect_perf_hf_T5_5268754e -> origin/bisect_perf_hf_T5_5268754e 2025-09-07T06:13:36.7132302Z * [new branch] bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c 2025-09-07T06:13:36.7133755Z * [new branch] bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c 2025-09-07T06:13:36.7134951Z * [new branch] bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f 2025-09-07T06:13:36.7136122Z * [new branch] bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0 2025-09-07T06:13:36.7137297Z * [new branch] bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149 2025-09-07T06:13:36.7138423Z * [new branch] bisect_perf_hf_T5_d65f194a -> origin/bisect_perf_hf_T5_d65f194a 2025-09-07T06:13:36.7139526Z * [new branch] bisect_perf_hf_T5_da94ab0b -> origin/bisect_perf_hf_T5_da94ab0b 2025-09-07T06:13:36.7140851Z * [new branch] bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new 2025-09-07T06:13:36.7141863Z * [new branch] bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8 2025-09-07T06:13:36.7142970Z * [new branch] bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2 2025-09-07T06:13:36.7144142Z * [new branch] bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563 2025-09-07T06:13:36.7146336Z * [new branch] bowbao/bench_updates_stage -> origin/bowbao/bench_updates_stage 2025-09-07T06:13:36.7147379Z * [new branch] bowbao/dort_rewriter -> origin/bowbao/dort_rewriter 2025-09-07T06:13:36.7148417Z * [new branch] bowbao/wip_prs -> origin/bowbao/wip_prs 2025-09-07T06:13:36.7150131Z * [new branch] brister/break_tensorbox -> origin/brister/break_tensorbox 2025-09-07T06:13:36.7151199Z * [new branch] brister/custom_fx_backend -> origin/brister/custom_fx_backend 2025-09-07T06:13:36.7152342Z * [new branch] brister/fx_custom_triton -> origin/brister/fx_custom_triton 2025-09-07T06:13:36.7153415Z * [new branch] brister/tensor_box_output -> origin/brister/tensor_box_output 2025-09-07T06:13:36.7154609Z * [new branch] brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check 2025-09-07T06:13:36.7155754Z * [new branch] c57382a49 -> origin/c57382a49 2025-09-07T06:13:36.7156854Z * [new branch] ca_0431d47eaa -> origin/ca_0431d47eaa 2025-09-07T06:13:36.7158009Z * [new branch] ca_fix_0431d47eaa -> origin/ca_fix_0431d47eaa 2025-09-07T06:13:36.7160287Z * [new branch] camyll/revert-94bc900da97ad7f3c35b3b819bb53b23c74b581a-for-release-2.8 -> origin/camyll/revert-94bc900da97ad7f3c35b3b819bb53b23c74b581a-for-release-2.8 2025-09-07T06:13:36.7161473Z * [new branch] camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push 2025-09-07T06:13:36.7162758Z * [new branch] cherry-pick-149654-by-pytorch_bot_bot_ -> origin/cherry-pick-149654-by-pytorch_bot_bot_ 2025-09-07T06:13:36.7163944Z * [new branch] cherry-pick-151939-by-pytorch_bot_bot_ -> origin/cherry-pick-151939-by-pytorch_bot_bot_ 2025-09-07T06:13:36.7165108Z * [new branch] cherry-pick-154174-by-pytorch_bot_bot_ -> origin/cherry-pick-154174-by-pytorch_bot_bot_ 2025-09-07T06:13:36.7166320Z * [new branch] cherry-pick-156260-by-pytorch_bot_bot_ -> origin/cherry-pick-156260-by-pytorch_bot_bot_ 2025-09-07T06:13:36.7167575Z * [new branch] cherry-pick-157453-by-pytorch_bot_bot_ -> origin/cherry-pick-157453-by-pytorch_bot_bot_ 2025-09-07T06:13:36.7168770Z * [new branch] cherry-pick-157513-by-pytorch_bot_bot_ -> origin/cherry-pick-157513-by-pytorch_bot_bot_ 2025-09-07T06:13:36.7170020Z * [new branch] cherry-pick-157695-by-pytorch_bot_bot_ -> origin/cherry-pick-157695-by-pytorch_bot_bot_ 2025-09-07T06:13:36.7171176Z * [new branch] cherry-pick-157732-by-pytorch_bot_bot_ -> origin/cherry-pick-157732-by-pytorch_bot_bot_ 2025-09-07T06:13:36.7172330Z * [new branch] cherry-pick-158537-by-pytorch_bot_bot_ -> origin/cherry-pick-158537-by-pytorch_bot_bot_ 2025-09-07T06:13:36.7173902Z * [new branch] cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_ 2025-09-07T06:13:36.7175131Z * [new branch] cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_ 2025-09-07T06:13:36.7176758Z * [new branch] chilli/flex_vllm -> origin/chilli/flex_vllm 2025-09-07T06:13:36.7178072Z * [new branch] cleanup-inductor-benchmark-images -> origin/cleanup-inductor-benchmark-images 2025-09-07T06:13:36.7179130Z * [new branch] codex-testing -> origin/codex-testing 2025-09-07T06:13:36.7181278Z * [new branch] codex/add-helper-function-to-sizevars.py -> origin/codex/add-helper-function-to-sizevars.py 2025-09-07T06:13:36.7182412Z * [new branch] codex/add-helper-function-to-sizevars.py_2025-09-05 -> origin/codex/add-helper-function-to-sizevars.py_2025-09-05 2025-09-07T06:13:36.7183497Z * [new branch] codex/add-metadata-field-for-file-path -> origin/codex/add-metadata-field-for-file-path 2025-09-07T06:13:36.7185176Z * [new branch] codex/add-test-for-inductor-local-cache-behavior -> origin/codex/add-test-for-inductor-local-cache-behavior 2025-09-07T06:13:36.7186730Z * [new branch] codex/create-test-for-tensor-memory-leak-in-cudagraph -> origin/codex/create-test-for-tensor-memory-leak-in-cudagraph 2025-09-07T06:13:36.7187772Z * [new branch] codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch 2025-09-07T06:13:36.7188807Z * [new branch] codex/fix-issue-160415-in-pytorch -> origin/codex/fix-issue-160415-in-pytorch 2025-09-07T06:13:36.7190127Z * [new branch] codex/fix-noqengine-quantized-engine-support -> origin/codex/fix-noqengine-quantized-engine-support 2025-09-07T06:13:36.7191156Z * [new branch] codex/fix-pin_memory-error-handling -> origin/codex/fix-pin_memory-error-handling 2025-09-07T06:13:36.7192627Z * [new branch] codex/propose-fix-for-issue-160332 -> origin/codex/propose-fix-for-issue-160332 2025-09-07T06:13:36.7194132Z * [new branch] codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run 2025-09-07T06:13:36.7195447Z * [new branch] codex/remove-allow-untyped-defs-and-fix-type-errors -> origin/codex/remove-allow-untyped-defs-and-fix-type-errors 2025-09-07T06:13:36.7196547Z * [new branch] compile_fsdp2_disable_stream_and_event -> origin/compile_fsdp2_disable_stream_and_event 2025-09-07T06:13:36.7197472Z * [new branch] context_test -> origin/context_test 2025-09-07T06:13:36.7199208Z * [new branch] copilot/fix-157446 -> origin/copilot/fix-157446 2025-09-07T06:13:36.7200316Z * [new branch] copy_graph -> origin/copy_graph 2025-09-07T06:13:36.7202036Z * [new branch] cpio/fix_new_ami_tests -> origin/cpio/fix_new_ami_tests 2025-09-07T06:13:36.7203671Z * [new branch] csl/always_produce_xml -> origin/csl/always_produce_xml 2025-09-07T06:13:36.7204953Z * [new branch] csl/build_test_more_procs -> origin/csl/build_test_more_procs 2025-09-07T06:13:36.7206075Z * [new branch] csl/build_test_more_procs2 -> origin/csl/build_test_more_procs2 2025-09-07T06:13:36.7207188Z * [new branch] csl/disable_flaky_cpp_test -> origin/csl/disable_flaky_cpp_test 2025-09-07T06:13:36.7208284Z * [new branch] csl/disable_periodic_test -> origin/csl/disable_periodic_test 2025-09-07T06:13:36.7209611Z * [new branch] csl/exclude_rocm_viable_strict -> origin/csl/exclude_rocm_viable_strict 2025-09-07T06:13:36.7211093Z * [new branch] csl/katex -> origin/csl/katex 2025-09-07T06:13:36.7212521Z * [new branch] csl/larger_runner -> origin/csl/larger_runner 2025-09-07T06:13:36.7213970Z * [new branch] csl/lintrunner_stuff -> origin/csl/lintrunner_stuff 2025-09-07T06:13:36.7215119Z * [new branch] csl/mps_sharding -> origin/csl/mps_sharding 2025-09-07T06:13:36.7216299Z * [new branch] csl/multistage_docker -> origin/csl/multistage_docker 2025-09-07T06:13:36.7217473Z * [new branch] csl/name_link_check_job -> origin/csl/name_link_check_job 2025-09-07T06:13:36.7218643Z * [new branch] csl/no_keep_goin_rocm -> origin/csl/no_keep_goin_rocm 2025-09-07T06:13:36.7219803Z * [new branch] csl/not_600_timeout -> origin/csl/not_600_timeout 2025-09-07T06:13:36.7221086Z * [new branch] csl/revert_open -> origin/csl/revert_open 2025-09-07T06:13:36.7222177Z * [new branch] csl/skip_build -> origin/csl/skip_build 2025-09-07T06:13:36.7223415Z * [new branch] csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner 2025-09-07T06:13:36.7224601Z * [new branch] csl/win_sccache -> origin/csl/win_sccache 2025-09-07T06:13:36.7225924Z * [new branch] cublasltrelax2 -> origin/cublasltrelax2 2025-09-07T06:13:36.7227053Z * [new branch] cublasrelax2 -> origin/cublasrelax2 2025-09-07T06:13:36.7228256Z * [new branch] cudnnsdparefactor -> origin/cudnnsdparefactor 2025-09-07T06:13:36.7229457Z * [new branch] custom_lowering_dict -> origin/custom_lowering_dict 2025-09-07T06:13:36.7230546Z * [new branch] czhuge_muon_dev -> origin/czhuge_muon_dev 2025-09-07T06:13:36.7232333Z * [new branch] d4l3k/delete_hook -> origin/d4l3k/delete_hook 2025-09-07T06:13:36.7233390Z * [new branch] dcp_zoc -> origin/dcp_zoc 2025-09-07T06:13:36.7234665Z * [new branch] debug-guard -> origin/debug-guard 2025-09-07T06:13:36.7235801Z * [new branch] delete-quant-docs -> origin/delete-quant-docs 2025-09-07T06:13:36.7239798Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.2 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.2 2025-09-07T06:13:36.7241301Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.3 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.3 2025-09-07T06:13:36.7242793Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.4 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.55.4 2025-09-07T06:13:36.7244262Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.56.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.56.0 2025-09-07T06:13:36.7245522Z * [new branch] dependabot/pip/dot-ci/docker/protobuf-5.29.5 -> origin/dependabot/pip/dot-ci/docker/protobuf-5.29.5 2025-09-07T06:13:36.7247150Z * [new branch] dependabot/pip/dot-github/requirements/protobuf-5.29.5 -> origin/dependabot/pip/dot-github/requirements/protobuf-5.29.5 2025-09-07T06:13:36.7248501Z * [new branch] desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper 2025-09-07T06:13:36.7249722Z * [new branch] desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64 2025-09-07T06:13:36.7252059Z * [new branch] dev/joona/MPSNDArrayAdd -> origin/dev/joona/MPSNDArrayAdd 2025-09-07T06:13:36.7253709Z * [new branch] dev/joona/Unranked -> origin/dev/joona/Unranked 2025-09-07T06:13:36.7255264Z * [new branch] dev/joona/cat -> origin/dev/joona/cat 2025-09-07T06:13:36.7256632Z * [new branch] dev/joona/cat_remove_graph -> origin/dev/joona/cat_remove_graph 2025-09-07T06:13:36.7257788Z * [new branch] dev/joona/embeddingbag -> origin/dev/joona/embeddingbag 2025-09-07T06:13:36.7259327Z * [new branch] dev/joona/getTensorsString -> origin/dev/joona/getTensorsString 2025-09-07T06:13:36.7261073Z * [new branch] dev/joona/maxpool2dwithindices_errmsg -> origin/dev/joona/maxpool2dwithindices_errmsg 2025-09-07T06:13:36.7262712Z * [new branch] dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14 2025-09-07T06:13:36.7264437Z * [new branch] dev/joona/sdpa -> origin/dev/joona/sdpa 2025-09-07T06:13:36.7266323Z * [new branch] dev/joona/topk_newapi -> origin/dev/joona/topk_newapi 2025-09-07T06:13:36.7267544Z * [new branch] dev/joona/type_inf -> origin/dev/joona/type_inf 2025-09-07T06:13:36.7268758Z * [new branch] dev/joona/upsize3d -> origin/dev/joona/upsize3d 2025-09-07T06:13:36.7269994Z * [new branch] disable -> origin/disable 2025-09-07T06:13:36.7271091Z * [new branch] e2e-baseline -> origin/e2e-baseline 2025-09-07T06:13:36.7272226Z * [new branch] eigen_for_sparse_addmm_v2 -> origin/eigen_for_sparse_addmm_v2 2025-09-07T06:13:36.7274031Z * [new branch] embg/test_inductor_ci_128B -> origin/embg/test_inductor_ci_128B 2025-09-07T06:13:36.7275047Z * [new branch] embg/test_inductor_ci_base -> origin/embg/test_inductor_ci_base 2025-09-07T06:13:36.7276286Z * [new branch] embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control 2025-09-07T06:13:36.7277347Z * [new branch] embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B 2025-09-07T06:13:36.7278647Z * [new branch] embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B 2025-09-07T06:13:36.7279954Z * [new branch] eqy-patch-1 -> origin/eqy-patch-1 2025-09-07T06:13:36.7280991Z * [new branch] eqy-patch-2 -> origin/eqy-patch-2 2025-09-07T06:13:36.7282101Z * [new branch] eqy-patch-3 -> origin/eqy-patch-3 2025-09-07T06:13:36.7283394Z * [new branch] eqy-patch-4 -> origin/eqy-patch-4 2025-09-07T06:13:36.7284592Z * [new branch] example-convert-torch.nn -> origin/example-convert-torch.nn 2025-09-07T06:13:36.7286609Z * [new branch] exclamaforte/add-contiguous-threshold -> origin/exclamaforte/add-contiguous-threshold 2025-09-07T06:13:36.7287524Z * [new branch] exclamaforte/amd-ma -> origin/exclamaforte/amd-ma 2025-09-07T06:13:36.7288727Z * [new branch] exclamaforte/bump-transformer-version -> origin/exclamaforte/bump-transformer-version 2025-09-07T06:13:36.7289768Z * [new branch] exclamaforte/clear-feedback-savers -> origin/exclamaforte/clear-feedback-savers 2025-09-07T06:13:36.7290903Z * [new branch] exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run 2025-09-07T06:13:36.7292673Z * [new branch] exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor 2025-09-07T06:13:36.7294987Z * [new branch] exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion 2025-09-07T06:13:36.7296171Z * [new branch] exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning 2025-09-07T06:13:36.7297565Z * [new branch] exclamaforte/fix-exhuastive-autotuning-reland -> origin/exclamaforte/fix-exhuastive-autotuning-reland 2025-09-07T06:13:36.7298778Z * [new branch] exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg 2025-09-07T06:13:36.7299860Z * [new branch] exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run 2025-09-07T06:13:36.7301007Z * [new branch] exclamaforte/fusion-data -> origin/exclamaforte/fusion-data 2025-09-07T06:13:36.7302290Z * [new branch] exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run 2025-09-07T06:13:36.7303390Z * [new branch] exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model 2025-09-07T06:13:36.7304637Z * [new branch] exclamaforte/gemm-model -> origin/exclamaforte/gemm-model 2025-09-07T06:13:36.7306046Z * [new branch] exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection 2025-09-07T06:13:36.7307225Z * [new branch] exclamaforte/gemm-to-amd -> origin/exclamaforte/gemm-to-amd 2025-09-07T06:13:36.7308418Z * [new branch] exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model 2025-09-07T06:13:36.7309450Z * [new branch] exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor 2025-09-07T06:13:36.7310463Z * [new branch] exclamaforte/max-autotune-ieee -> origin/exclamaforte/max-autotune-ieee 2025-09-07T06:13:36.7311586Z * [new branch] exclamaforte/memory-counter -> origin/exclamaforte/memory-counter 2025-09-07T06:13:36.7312698Z * [new branch] exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo 2025-09-07T06:13:36.7313829Z * [new branch] exclamaforte/profiler-combo -> origin/exclamaforte/profiler-combo 2025-09-07T06:13:36.7315023Z * [new branch] exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode 2025-09-07T06:13:36.7316158Z * [new branch] exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs 2025-09-07T06:13:36.7317408Z * [new branch] exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2 2025-09-07T06:13:36.7319211Z * [new branch] exclamforte/gemm-model-final -> origin/exclamforte/gemm-model-final 2025-09-07T06:13:36.7320078Z * [new branch] exec -> origin/exec 2025-09-07T06:13:36.7321278Z * [new branch] executorch-module-shim -> origin/executorch-module-shim 2025-09-07T06:13:36.7322527Z * [new branch] experimental-mosaic -> origin/experimental-mosaic 2025-09-07T06:13:36.7323654Z * [new branch] export-D58091437 -> origin/export-D58091437 2025-09-07T06:13:36.7324890Z * [new branch] export-D61047529 -> origin/export-D61047529 2025-09-07T06:13:36.7326634Z * [new branch] export-D70112642 -> origin/export-D70112642 2025-09-07T06:13:36.7327879Z * [new branch] export-D71412006 -> origin/export-D71412006 2025-09-07T06:13:36.7329271Z * [new branch] export-D73042989 -> origin/export-D73042989 2025-09-07T06:13:36.7330761Z * [new branch] export-D75183591 -> origin/export-D75183591 2025-09-07T06:13:36.7331789Z * [new branch] export-D75617432 -> origin/export-D75617432 2025-09-07T06:13:36.7333207Z * [new branch] export-D75659965 -> origin/export-D75659965 2025-09-07T06:13:36.7334380Z * [new branch] export-D76080931 -> origin/export-D76080931 2025-09-07T06:13:36.7335652Z * [new branch] export-D76797250 -> origin/export-D76797250 2025-09-07T06:13:36.7336706Z * [new branch] export-D76885271 -> origin/export-D76885271 2025-09-07T06:13:36.7337830Z * [new branch] export-D76885620 -> origin/export-D76885620 2025-09-07T06:13:36.7339170Z * [new branch] export-D76936623 -> origin/export-D76936623 2025-09-07T06:13:36.7340364Z * [new branch] export-D76958268 -> origin/export-D76958268 2025-09-07T06:13:36.7341508Z * [new branch] export-D78375400 -> origin/export-D78375400 2025-09-07T06:13:36.7342645Z * [new branch] export-D78431305 -> origin/export-D78431305 2025-09-07T06:13:36.7343823Z * [new branch] export-D78580107 -> origin/export-D78580107 2025-09-07T06:13:36.7345126Z * [new branch] export-D78822171 -> origin/export-D78822171 2025-09-07T06:13:36.7346538Z * [new branch] export-D78822351 -> origin/export-D78822351 2025-09-07T06:13:36.7347858Z * [new branch] export-D78822507 -> origin/export-D78822507 2025-09-07T06:13:36.7348762Z * [new branch] export-D78826994 -> origin/export-D78826994 2025-09-07T06:13:36.7349846Z * [new branch] export-D78894324 -> origin/export-D78894324 2025-09-07T06:13:36.7351181Z * [new branch] export-D78929245 -> origin/export-D78929245 2025-09-07T06:13:36.7352161Z * [new branch] export-D78934925 -> origin/export-D78934925 2025-09-07T06:13:36.7353333Z * [new branch] export-D78953203 -> origin/export-D78953203 2025-09-07T06:13:36.7354528Z * [new branch] export-D78953229 -> origin/export-D78953229 2025-09-07T06:13:36.7355526Z * [new branch] export-D78957093 -> origin/export-D78957093 2025-09-07T06:13:36.7356686Z * [new branch] export-D78957389 -> origin/export-D78957389 2025-09-07T06:13:36.7357778Z * [new branch] export-D78996107 -> origin/export-D78996107 2025-09-07T06:13:36.7359515Z * [new branch] export-D79026433 -> origin/export-D79026433 2025-09-07T06:13:36.7360625Z * [new branch] export-D79230339 -> origin/export-D79230339 2025-09-07T06:13:36.7361735Z * [new branch] export-D79319835 -> origin/export-D79319835 2025-09-07T06:13:36.7362801Z * [new branch] export-D79328456 -> origin/export-D79328456 2025-09-07T06:13:36.7364073Z * [new branch] export-D79534608 -> origin/export-D79534608 2025-09-07T06:13:36.7365514Z * [new branch] export-D79785974 -> origin/export-D79785974 2025-09-07T06:13:36.7366616Z * [new branch] export-D80025417 -> origin/export-D80025417 2025-09-07T06:13:36.7367853Z * [new branch] export-D80120333 -> origin/export-D80120333 2025-09-07T06:13:36.7369090Z * [new branch] export-D80214882 -> origin/export-D80214882 2025-09-07T06:13:36.7370222Z * [new branch] export-D80319069 -> origin/export-D80319069 2025-09-07T06:13:36.7371458Z * [new branch] export-D80321215 -> origin/export-D80321215 2025-09-07T06:13:36.7372534Z * [new branch] export-D80503451 -> origin/export-D80503451 2025-09-07T06:13:36.7374002Z * [new branch] export-D80771648 -> origin/export-D80771648 2025-09-07T06:13:36.7375135Z * [new branch] export-D80823877 -> origin/export-D80823877 2025-09-07T06:13:36.7376366Z * [new branch] export-D80948073 -> origin/export-D80948073 2025-09-07T06:13:36.7377825Z * [new branch] export-D80958642 -> origin/export-D80958642 2025-09-07T06:13:36.7378851Z * [new branch] export-D80970483 -> origin/export-D80970483 2025-09-07T06:13:36.7380003Z * [new branch] export-D81054193 -> origin/export-D81054193 2025-09-07T06:13:36.7381160Z * [new branch] export-D81060182 -> origin/export-D81060182 2025-09-07T06:13:36.7382506Z * [new branch] export-D81078973 -> origin/export-D81078973 2025-09-07T06:13:36.7383607Z * [new branch] export-D81204584 -> origin/export-D81204584 2025-09-07T06:13:36.7384942Z * [new branch] export-D81284190 -> origin/export-D81284190 2025-09-07T06:13:36.7386176Z * [new branch] export-D81299840 -> origin/export-D81299840 2025-09-07T06:13:36.7387296Z * [new branch] export-D81429090 -> origin/export-D81429090 2025-09-07T06:13:36.7388421Z * [new branch] export-D81698719 -> origin/export-D81698719 2025-09-07T06:13:36.7389627Z * [new branch] export-D81747409 -> origin/export-D81747409 2025-09-07T06:13:36.7391065Z * [new branch] exported-model-train-idempotent -> origin/exported-model-train-idempotent 2025-09-07T06:13:36.7393044Z * [new branch] ezyang/wip-aot-descriptors -> origin/ezyang/wip-aot-descriptors 2025-09-07T06:13:36.7394145Z * [new branch] fa_u8_brgemm -> origin/fa_u8_brgemm 2025-09-07T06:13:36.7395598Z * [new branch] fastmath_baseline -> origin/fastmath_baseline 2025-09-07T06:13:36.7397268Z * [new branch] fbcode/warm -> origin/fbcode/warm 2025-09-07T06:13:36.7398608Z * [new branch] fca -> origin/fca 2025-09-07T06:13:36.7399726Z * [new branch] fca2_ca5984c -> origin/fca2_ca5984c 2025-09-07T06:13:36.7400981Z * [new branch] fca5 -> origin/fca5 2025-09-07T06:13:36.7402751Z * [new branch] feature/function-numa-binding -> origin/feature/function-numa-binding 2025-09-07T06:13:36.7403918Z * [new branch] feature/function-numa-binding-take2 -> origin/feature/function-numa-binding-take2 2025-09-07T06:13:36.7404995Z * [new branch] feature/numa-nproc-fix -> origin/feature/numa-nproc-fix 2025-09-07T06:13:36.7406147Z * [new branch] feature/numa-signpost-serialize -> origin/feature/numa-signpost-serialize 2025-09-07T06:13:36.7407205Z * [new branch] feature/parallel-numa-binding -> origin/feature/parallel-numa-binding 2025-09-07T06:13:36.7408874Z * [new branch] fengyuan/external-proj -> origin/fengyuan/external-proj 2025-09-07T06:13:36.7410114Z * [new branch] fengyuan/out-of-tree-xpu-ops-improve-test -> origin/fengyuan/out-of-tree-xpu-ops-improve-test 2025-09-07T06:13:36.7411322Z * [new branch] fengyuan/out-of-tree-xpu-ops-remove-dtype -> origin/fengyuan/out-of-tree-xpu-ops-remove-dtype 2025-09-07T06:13:36.7412202Z * [new branch] fengyuan/test-xpu -> origin/fengyuan/test-xpu 2025-09-07T06:13:36.7414321Z * [new branch] ffast_math_baseline -> origin/ffast_math_baseline 2025-09-07T06:13:36.7415439Z * [new branch] ffast_math_target -> origin/ffast_math_target 2025-09-07T06:13:36.7417284Z * [new branch] findhao/base_commit -> origin/findhao/base_commit 2025-09-07T06:13:36.7418418Z * [new branch] findhao/base_commit1 -> origin/findhao/base_commit1 2025-09-07T06:13:36.7419512Z * [new branch] findhao/multistream2 -> origin/findhao/multistream2 2025-09-07T06:13:36.7420580Z * [new branch] findhao/multistream5 -> origin/findhao/multistream5 2025-09-07T06:13:36.7421746Z * [new branch] findhao/multistream6 -> origin/findhao/multistream6 2025-09-07T06:13:36.7422887Z * [new branch] findhao/operatorbench3 -> origin/findhao/operatorbench3 2025-09-07T06:13:36.7424030Z * [new branch] findhao/operatorbench5 -> origin/findhao/operatorbench5 2025-09-07T06:13:36.7425385Z * [new branch] findhao/tritonparse -> origin/findhao/tritonparse 2025-09-07T06:13:36.7426492Z * [new branch] fix -> origin/fix 2025-09-07T06:13:36.7427874Z * [new branch] fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format 2025-09-07T06:13:36.7428809Z * [new branch] fix-config-ignore -> origin/fix-config-ignore 2025-09-07T06:13:36.7430583Z * [new branch] fix-dict-guard -> origin/fix-dict-guard 2025-09-07T06:13:36.7431819Z * [new branch] fix-inductor-periodic-0528 -> origin/fix-inductor-periodic-0528 2025-09-07T06:13:36.7432869Z * [new branch] fix-mps-benchmark -> origin/fix-mps-benchmark 2025-09-07T06:13:36.7434105Z * [new branch] fix-rlease-feature-template -> origin/fix-rlease-feature-template 2025-09-07T06:13:36.7435323Z * [new branch] fix-run-condition-upload-results -> origin/fix-run-condition-upload-results 2025-09-07T06:13:36.7436330Z * [new branch] fix-torchbench -> origin/fix-torchbench 2025-09-07T06:13:36.7437406Z * [new branch] fix_153389 -> origin/fix_153389 2025-09-07T06:13:36.7438786Z * [new branch] fix_fsdp_rs_bucket2 -> origin/fix_fsdp_rs_bucket2 2025-09-07T06:13:36.7439822Z * [new branch] fix_inductor_peridic_tests -> origin/fix_inductor_peridic_tests 2025-09-07T06:13:36.7440885Z * [new branch] fix_ubn_159469 -> origin/fix_ubn_159469 2025-09-07T06:13:36.7442165Z * [new branch] fixes-triage -> origin/fixes-triage 2025-09-07T06:13:36.7443294Z * [new branch] fixflashinfer -> origin/fixflashinfer 2025-09-07T06:13:36.7444438Z * [new branch] flash_decoding_cpu -> origin/flash_decoding_cpu 2025-09-07T06:13:36.7445646Z * [new branch] flex-flash -> origin/flex-flash 2025-09-07T06:13:36.7446800Z * [new branch] flex-lowering -> origin/flex-lowering 2025-09-07T06:13:36.7447940Z * [new branch] flex-warning -> origin/flex-warning 2025-09-07T06:13:36.7449170Z * [new branch] flex_attention_functorch_grad -> origin/flex_attention_functorch_grad 2025-09-07T06:13:36.7450670Z * [new branch] flex_flash -> origin/flex_flash 2025-09-07T06:13:36.7451849Z * [new branch] flexdecode-gqa-groups -> origin/flexdecode-gqa-groups 2025-09-07T06:13:36.7454029Z * [new branch] fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule 2025-09-07T06:13:36.7455090Z * [new branch] fsdp2_trace_rules -> origin/fsdp2_trace_rules 2025-09-07T06:13:36.7456306Z * [new branch] fsdpv2_3d -> origin/fsdpv2_3d 2025-09-07T06:13:36.7457733Z * [new branch] fsdpv2_3d_m1 -> origin/fsdpv2_3d_m1 2025-09-07T06:13:36.7458890Z * [new branch] fx_cpp -> origin/fx_cpp 2025-09-07T06:13:36.7460654Z * [new branch] fy/fix-win -> origin/fy/fix-win 2025-09-07T06:13:36.7463620Z * [new branch] gh/AlnisM/1/base -> origin/gh/AlnisM/1/base 2025-09-07T06:13:36.7464737Z * [new branch] gh/AlnisM/1/head -> origin/gh/AlnisM/1/head 2025-09-07T06:13:36.7466755Z * [new branch] gh/CaoE/2/base -> origin/gh/CaoE/2/base 2025-09-07T06:13:36.7467805Z * [new branch] gh/CaoE/2/head -> origin/gh/CaoE/2/head 2025-09-07T06:13:36.7468927Z * [new branch] gh/CaoE/2/orig -> origin/gh/CaoE/2/orig 2025-09-07T06:13:36.7471269Z * [new branch] gh/ColinPeppler/79/base -> origin/gh/ColinPeppler/79/base 2025-09-07T06:13:36.7472442Z * [new branch] gh/ColinPeppler/79/head -> origin/gh/ColinPeppler/79/head 2025-09-07T06:13:36.7473541Z * [new branch] gh/ColinPeppler/79/orig -> origin/gh/ColinPeppler/79/orig 2025-09-07T06:13:36.7475417Z * [new branch] gh/ColinPeppler/80/base -> origin/gh/ColinPeppler/80/base 2025-09-07T06:13:36.7476771Z * [new branch] gh/ColinPeppler/80/head -> origin/gh/ColinPeppler/80/head 2025-09-07T06:13:36.7478019Z * [new branch] gh/ColinPeppler/80/orig -> origin/gh/ColinPeppler/80/orig 2025-09-07T06:13:36.7480138Z * [new branch] gh/EikanWang/67/base -> origin/gh/EikanWang/67/base 2025-09-07T06:13:36.7481207Z * [new branch] gh/EikanWang/67/head -> origin/gh/EikanWang/67/head 2025-09-07T06:13:36.7483042Z * [new branch] gh/EikanWang/80/base -> origin/gh/EikanWang/80/base 2025-09-07T06:13:36.7484119Z * [new branch] gh/EikanWang/80/head -> origin/gh/EikanWang/80/head 2025-09-07T06:13:36.7485225Z * [new branch] gh/EikanWang/80/orig -> origin/gh/EikanWang/80/orig 2025-09-07T06:13:36.7486847Z * [new branch] gh/EikanWang/81/base -> origin/gh/EikanWang/81/base 2025-09-07T06:13:36.7487847Z * [new branch] gh/EikanWang/81/head -> origin/gh/EikanWang/81/head 2025-09-07T06:13:36.7489056Z * [new branch] gh/EikanWang/81/orig -> origin/gh/EikanWang/81/orig 2025-09-07T06:13:36.7490590Z * [new branch] gh/EikanWang/82/base -> origin/gh/EikanWang/82/base 2025-09-07T06:13:36.7491600Z * [new branch] gh/EikanWang/82/head -> origin/gh/EikanWang/82/head 2025-09-07T06:13:36.7494588Z * [new branch] gh/EikanWang/82/orig -> origin/gh/EikanWang/82/orig 2025-09-07T06:13:36.7497151Z * [new branch] gh/Gasoonjia/1/base -> origin/gh/Gasoonjia/1/base 2025-09-07T06:13:36.7498331Z * [new branch] gh/Gasoonjia/1/head -> origin/gh/Gasoonjia/1/head 2025-09-07T06:13:36.7500535Z * [new branch] gh/H-Huang/131/base -> origin/gh/H-Huang/131/base 2025-09-07T06:13:36.7501676Z * [new branch] gh/H-Huang/131/head -> origin/gh/H-Huang/131/head 2025-09-07T06:13:36.7502819Z * [new branch] gh/H-Huang/131/orig -> origin/gh/H-Huang/131/orig 2025-09-07T06:13:36.7504697Z * [new branch] gh/H-Huang/132/base -> origin/gh/H-Huang/132/base 2025-09-07T06:13:36.7505932Z * [new branch] gh/H-Huang/132/head -> origin/gh/H-Huang/132/head 2025-09-07T06:13:36.7507032Z * [new branch] gh/H-Huang/132/orig -> origin/gh/H-Huang/132/orig 2025-09-07T06:13:36.7508749Z * [new branch] gh/H-Huang/180/base -> origin/gh/H-Huang/180/base 2025-09-07T06:13:36.7509758Z * [new branch] gh/H-Huang/180/head -> origin/gh/H-Huang/180/head 2025-09-07T06:13:36.7510913Z * [new branch] gh/H-Huang/180/orig -> origin/gh/H-Huang/180/orig 2025-09-07T06:13:36.7512606Z * [new branch] gh/H-Huang/182/base -> origin/gh/H-Huang/182/base 2025-09-07T06:13:36.7513698Z * [new branch] gh/H-Huang/182/head -> origin/gh/H-Huang/182/head 2025-09-07T06:13:36.7514857Z * [new branch] gh/H-Huang/182/orig -> origin/gh/H-Huang/182/orig 2025-09-07T06:13:36.7516636Z * [new branch] gh/H-Huang/187/base -> origin/gh/H-Huang/187/base 2025-09-07T06:13:36.7517601Z * [new branch] gh/H-Huang/187/head -> origin/gh/H-Huang/187/head 2025-09-07T06:13:36.7518689Z * [new branch] gh/H-Huang/187/orig -> origin/gh/H-Huang/187/orig 2025-09-07T06:13:36.7520354Z * [new branch] gh/H-Huang/202/base -> origin/gh/H-Huang/202/base 2025-09-07T06:13:36.7521515Z * [new branch] gh/H-Huang/202/head -> origin/gh/H-Huang/202/head 2025-09-07T06:13:36.7522655Z * [new branch] gh/H-Huang/202/orig -> origin/gh/H-Huang/202/orig 2025-09-07T06:13:36.7524343Z * [new branch] gh/H-Huang/203/base -> origin/gh/H-Huang/203/base 2025-09-07T06:13:36.7525541Z * [new branch] gh/H-Huang/203/head -> origin/gh/H-Huang/203/head 2025-09-07T06:13:36.7526656Z * [new branch] gh/H-Huang/203/orig -> origin/gh/H-Huang/203/orig 2025-09-07T06:13:36.7528302Z * [new branch] gh/H-Huang/204/base -> origin/gh/H-Huang/204/base 2025-09-07T06:13:36.7529353Z * [new branch] gh/H-Huang/204/head -> origin/gh/H-Huang/204/head 2025-09-07T06:13:36.7530433Z * [new branch] gh/H-Huang/204/orig -> origin/gh/H-Huang/204/orig 2025-09-07T06:13:36.7532100Z * [new branch] gh/H-Huang/205/base -> origin/gh/H-Huang/205/base 2025-09-07T06:13:36.7533463Z * [new branch] gh/H-Huang/205/head -> origin/gh/H-Huang/205/head 2025-09-07T06:13:36.7534665Z * [new branch] gh/H-Huang/205/orig -> origin/gh/H-Huang/205/orig 2025-09-07T06:13:36.7536428Z * [new branch] gh/H-Huang/206/base -> origin/gh/H-Huang/206/base 2025-09-07T06:13:36.7537483Z * [new branch] gh/H-Huang/206/head -> origin/gh/H-Huang/206/head 2025-09-07T06:13:36.7538765Z * [new branch] gh/H-Huang/206/orig -> origin/gh/H-Huang/206/orig 2025-09-07T06:13:36.7540366Z * [new branch] gh/H-Huang/207/base -> origin/gh/H-Huang/207/base 2025-09-07T06:13:36.7541433Z * [new branch] gh/H-Huang/207/head -> origin/gh/H-Huang/207/head 2025-09-07T06:13:36.7542644Z * [new branch] gh/H-Huang/207/orig -> origin/gh/H-Huang/207/orig 2025-09-07T06:13:36.7544323Z * [new branch] gh/H-Huang/208/base -> origin/gh/H-Huang/208/base 2025-09-07T06:13:36.7545617Z * [new branch] gh/H-Huang/208/head -> origin/gh/H-Huang/208/head 2025-09-07T06:13:36.7546736Z * [new branch] gh/H-Huang/208/orig -> origin/gh/H-Huang/208/orig 2025-09-07T06:13:36.7548327Z * [new branch] gh/H-Huang/209/base -> origin/gh/H-Huang/209/base 2025-09-07T06:13:36.7549352Z * [new branch] gh/H-Huang/209/head -> origin/gh/H-Huang/209/head 2025-09-07T06:13:36.7550445Z * [new branch] gh/H-Huang/209/orig -> origin/gh/H-Huang/209/orig 2025-09-07T06:13:36.7552072Z * [new branch] gh/H-Huang/210/base -> origin/gh/H-Huang/210/base 2025-09-07T06:13:36.7553064Z * [new branch] gh/H-Huang/210/head -> origin/gh/H-Huang/210/head 2025-09-07T06:13:36.7554161Z * [new branch] gh/H-Huang/210/orig -> origin/gh/H-Huang/210/orig 2025-09-07T06:13:36.7556005Z * [new branch] gh/H-Huang/211/base -> origin/gh/H-Huang/211/base 2025-09-07T06:13:36.7557042Z * [new branch] gh/H-Huang/211/head -> origin/gh/H-Huang/211/head 2025-09-07T06:13:36.7558102Z * [new branch] gh/H-Huang/211/orig -> origin/gh/H-Huang/211/orig 2025-09-07T06:13:36.7559688Z * [new branch] gh/H-Huang/212/base -> origin/gh/H-Huang/212/base 2025-09-07T06:13:36.7560759Z * [new branch] gh/H-Huang/212/head -> origin/gh/H-Huang/212/head 2025-09-07T06:13:36.7561818Z * [new branch] gh/H-Huang/212/orig -> origin/gh/H-Huang/212/orig 2025-09-07T06:13:36.7564108Z * [new branch] gh/H-Huang/213/base -> origin/gh/H-Huang/213/base 2025-09-07T06:13:36.7565197Z * [new branch] gh/H-Huang/213/head -> origin/gh/H-Huang/213/head 2025-09-07T06:13:36.7566246Z * [new branch] gh/H-Huang/213/orig -> origin/gh/H-Huang/213/orig 2025-09-07T06:13:36.7567904Z * [new branch] gh/H-Huang/214/base -> origin/gh/H-Huang/214/base 2025-09-07T06:13:36.7568929Z * [new branch] gh/H-Huang/214/head -> origin/gh/H-Huang/214/head 2025-09-07T06:13:36.7570193Z * [new branch] gh/H-Huang/214/orig -> origin/gh/H-Huang/214/orig 2025-09-07T06:13:36.7572256Z * [new branch] gh/IvanKobzarev/112/base -> origin/gh/IvanKobzarev/112/base 2025-09-07T06:13:36.7573698Z * [new branch] gh/IvanKobzarev/112/head -> origin/gh/IvanKobzarev/112/head 2025-09-07T06:13:36.7574942Z * [new branch] gh/IvanKobzarev/112/orig -> origin/gh/IvanKobzarev/112/orig 2025-09-07T06:13:36.7576693Z * [new branch] gh/IvanKobzarev/115/base -> origin/gh/IvanKobzarev/115/base 2025-09-07T06:13:36.7577864Z * [new branch] gh/IvanKobzarev/115/head -> origin/gh/IvanKobzarev/115/head 2025-09-07T06:13:36.7579060Z * [new branch] gh/IvanKobzarev/115/orig -> origin/gh/IvanKobzarev/115/orig 2025-09-07T06:13:36.7581182Z * [new branch] gh/IvanKobzarev/116/base -> origin/gh/IvanKobzarev/116/base 2025-09-07T06:13:36.7582379Z * [new branch] gh/IvanKobzarev/116/head -> origin/gh/IvanKobzarev/116/head 2025-09-07T06:13:36.7583614Z * [new branch] gh/IvanKobzarev/116/orig -> origin/gh/IvanKobzarev/116/orig 2025-09-07T06:13:36.7585516Z * [new branch] gh/IvanKobzarev/118/base -> origin/gh/IvanKobzarev/118/base 2025-09-07T06:13:36.7586693Z * [new branch] gh/IvanKobzarev/118/head -> origin/gh/IvanKobzarev/118/head 2025-09-07T06:13:36.7587805Z * [new branch] gh/IvanKobzarev/118/orig -> origin/gh/IvanKobzarev/118/orig 2025-09-07T06:13:36.7589594Z * [new branch] gh/IvanKobzarev/126/base -> origin/gh/IvanKobzarev/126/base 2025-09-07T06:13:36.7590791Z * [new branch] gh/IvanKobzarev/126/head -> origin/gh/IvanKobzarev/126/head 2025-09-07T06:13:36.7592077Z * [new branch] gh/IvanKobzarev/126/orig -> origin/gh/IvanKobzarev/126/orig 2025-09-07T06:13:36.7594201Z * [new branch] gh/IvanKobzarev/127/base -> origin/gh/IvanKobzarev/127/base 2025-09-07T06:13:36.7595316Z * [new branch] gh/IvanKobzarev/127/head -> origin/gh/IvanKobzarev/127/head 2025-09-07T06:13:36.7596473Z * [new branch] gh/IvanKobzarev/127/orig -> origin/gh/IvanKobzarev/127/orig 2025-09-07T06:13:36.7598250Z * [new branch] gh/IvanKobzarev/128/base -> origin/gh/IvanKobzarev/128/base 2025-09-07T06:13:36.7599318Z * [new branch] gh/IvanKobzarev/128/head -> origin/gh/IvanKobzarev/128/head 2025-09-07T06:13:36.7600467Z * [new branch] gh/IvanKobzarev/128/orig -> origin/gh/IvanKobzarev/128/orig 2025-09-07T06:13:36.7602264Z * [new branch] gh/IvanKobzarev/132/base -> origin/gh/IvanKobzarev/132/base 2025-09-07T06:13:36.7603432Z * [new branch] gh/IvanKobzarev/132/head -> origin/gh/IvanKobzarev/132/head 2025-09-07T06:13:36.7604715Z * [new branch] gh/IvanKobzarev/132/orig -> origin/gh/IvanKobzarev/132/orig 2025-09-07T06:13:36.7606964Z * [new branch] gh/IvanKobzarev/133/base -> origin/gh/IvanKobzarev/133/base 2025-09-07T06:13:36.7608368Z * [new branch] gh/IvanKobzarev/133/head -> origin/gh/IvanKobzarev/133/head 2025-09-07T06:13:36.7609416Z * [new branch] gh/IvanKobzarev/133/orig -> origin/gh/IvanKobzarev/133/orig 2025-09-07T06:13:36.7610999Z * [new branch] gh/IvanKobzarev/134/base -> origin/gh/IvanKobzarev/134/base 2025-09-07T06:13:36.7612037Z * [new branch] gh/IvanKobzarev/134/head -> origin/gh/IvanKobzarev/134/head 2025-09-07T06:13:36.7613412Z * [new branch] gh/IvanKobzarev/134/orig -> origin/gh/IvanKobzarev/134/orig 2025-09-07T06:13:36.7615517Z * [new branch] gh/IvanKobzarev/135/base -> origin/gh/IvanKobzarev/135/base 2025-09-07T06:13:36.7616633Z * [new branch] gh/IvanKobzarev/135/head -> origin/gh/IvanKobzarev/135/head 2025-09-07T06:13:36.7617897Z * [new branch] gh/IvanKobzarev/135/orig -> origin/gh/IvanKobzarev/135/orig 2025-09-07T06:13:36.7619766Z * [new branch] gh/IvanKobzarev/136/base -> origin/gh/IvanKobzarev/136/base 2025-09-07T06:13:36.7620887Z * [new branch] gh/IvanKobzarev/136/head -> origin/gh/IvanKobzarev/136/head 2025-09-07T06:13:36.7622109Z * [new branch] gh/IvanKobzarev/136/orig -> origin/gh/IvanKobzarev/136/orig 2025-09-07T06:13:36.7623650Z * [new branch] gh/IvanKobzarev/137/base -> origin/gh/IvanKobzarev/137/base 2025-09-07T06:13:36.7624770Z * [new branch] gh/IvanKobzarev/137/head -> origin/gh/IvanKobzarev/137/head 2025-09-07T06:13:36.7626002Z * [new branch] gh/IvanKobzarev/137/orig -> origin/gh/IvanKobzarev/137/orig 2025-09-07T06:13:36.7627783Z * [new branch] gh/IvanKobzarev/138/base -> origin/gh/IvanKobzarev/138/base 2025-09-07T06:13:36.7628838Z * [new branch] gh/IvanKobzarev/138/head -> origin/gh/IvanKobzarev/138/head 2025-09-07T06:13:36.7630069Z * [new branch] gh/IvanKobzarev/138/orig -> origin/gh/IvanKobzarev/138/orig 2025-09-07T06:13:36.7631714Z * [new branch] gh/IvanKobzarev/139/base -> origin/gh/IvanKobzarev/139/base 2025-09-07T06:13:36.7632787Z * [new branch] gh/IvanKobzarev/139/head -> origin/gh/IvanKobzarev/139/head 2025-09-07T06:13:36.7634010Z * [new branch] gh/IvanKobzarev/139/orig -> origin/gh/IvanKobzarev/139/orig 2025-09-07T06:13:36.7635766Z * [new branch] gh/IvanKobzarev/140/base -> origin/gh/IvanKobzarev/140/base 2025-09-07T06:13:36.7636810Z * [new branch] gh/IvanKobzarev/140/head -> origin/gh/IvanKobzarev/140/head 2025-09-07T06:13:36.7637938Z * [new branch] gh/IvanKobzarev/140/orig -> origin/gh/IvanKobzarev/140/orig 2025-09-07T06:13:36.7640096Z * [new branch] gh/IvanKobzarev/141/base -> origin/gh/IvanKobzarev/141/base 2025-09-07T06:13:36.7642501Z * [new branch] gh/IvanKobzarev/141/head -> origin/gh/IvanKobzarev/141/head 2025-09-07T06:13:36.7643240Z * [new branch] gh/IvanKobzarev/141/orig -> origin/gh/IvanKobzarev/141/orig 2025-09-07T06:13:36.7645155Z * [new branch] gh/IvanKobzarev/142/base -> origin/gh/IvanKobzarev/142/base 2025-09-07T06:13:36.7645940Z * [new branch] gh/IvanKobzarev/142/head -> origin/gh/IvanKobzarev/142/head 2025-09-07T06:13:36.7647073Z * [new branch] gh/IvanKobzarev/142/orig -> origin/gh/IvanKobzarev/142/orig 2025-09-07T06:13:36.7648887Z * [new branch] gh/IvanKobzarev/143/base -> origin/gh/IvanKobzarev/143/base 2025-09-07T06:13:36.7650038Z * [new branch] gh/IvanKobzarev/143/head -> origin/gh/IvanKobzarev/143/head 2025-09-07T06:13:36.7651212Z * [new branch] gh/IvanKobzarev/143/orig -> origin/gh/IvanKobzarev/143/orig 2025-09-07T06:13:36.7653369Z * [new branch] gh/IvanKobzarev/144/base -> origin/gh/IvanKobzarev/144/base 2025-09-07T06:13:36.7654540Z * [new branch] gh/IvanKobzarev/144/head -> origin/gh/IvanKobzarev/144/head 2025-09-07T06:13:36.7655713Z * [new branch] gh/IvanKobzarev/144/orig -> origin/gh/IvanKobzarev/144/orig 2025-09-07T06:13:36.7657526Z * [new branch] gh/IvanKobzarev/145/base -> origin/gh/IvanKobzarev/145/base 2025-09-07T06:13:36.7658682Z * [new branch] gh/IvanKobzarev/145/head -> origin/gh/IvanKobzarev/145/head 2025-09-07T06:13:36.7659900Z * [new branch] gh/IvanKobzarev/145/orig -> origin/gh/IvanKobzarev/145/orig 2025-09-07T06:13:36.7661611Z * [new branch] gh/IvanKobzarev/146/base -> origin/gh/IvanKobzarev/146/base 2025-09-07T06:13:36.7662781Z * [new branch] gh/IvanKobzarev/146/head -> origin/gh/IvanKobzarev/146/head 2025-09-07T06:13:36.7663990Z * [new branch] gh/IvanKobzarev/146/orig -> origin/gh/IvanKobzarev/146/orig 2025-09-07T06:13:36.7666240Z * [new branch] gh/NikhilAPatel/1/base -> origin/gh/NikhilAPatel/1/base 2025-09-07T06:13:36.7667427Z * [new branch] gh/NikhilAPatel/1/head -> origin/gh/NikhilAPatel/1/head 2025-09-07T06:13:36.7668863Z * [new branch] gh/NikhilAPatel/2/base -> origin/gh/NikhilAPatel/2/base 2025-09-07T06:13:36.7669916Z * [new branch] gh/NikhilAPatel/2/head -> origin/gh/NikhilAPatel/2/head 2025-09-07T06:13:36.7671752Z * [new branch] gh/NikhilAPatel/4/base -> origin/gh/NikhilAPatel/4/base 2025-09-07T06:13:36.7672882Z * [new branch] gh/NikhilAPatel/4/head -> origin/gh/NikhilAPatel/4/head 2025-09-07T06:13:36.7674708Z * [new branch] gh/PaliC/1/base -> origin/gh/PaliC/1/base 2025-09-07T06:13:36.7675761Z * [new branch] gh/PaliC/1/head -> origin/gh/PaliC/1/head 2025-09-07T06:13:36.7676853Z * [new branch] gh/PaliC/1/orig -> origin/gh/PaliC/1/orig 2025-09-07T06:13:36.7678606Z * [new branch] gh/PaliC/17/base -> origin/gh/PaliC/17/base 2025-09-07T06:13:36.7679732Z * [new branch] gh/PaliC/17/head -> origin/gh/PaliC/17/head 2025-09-07T06:13:36.7680874Z * [new branch] gh/PaliC/17/orig -> origin/gh/PaliC/17/orig 2025-09-07T06:13:36.7682651Z * [new branch] gh/PaliC/18/base -> origin/gh/PaliC/18/base 2025-09-07T06:13:36.7683613Z * [new branch] gh/PaliC/18/head -> origin/gh/PaliC/18/head 2025-09-07T06:13:36.7684730Z * [new branch] gh/PaliC/18/orig -> origin/gh/PaliC/18/orig 2025-09-07T06:13:36.7686324Z * [new branch] gh/PaliC/2/base -> origin/gh/PaliC/2/base 2025-09-07T06:13:36.7687355Z * [new branch] gh/PaliC/2/head -> origin/gh/PaliC/2/head 2025-09-07T06:13:36.7688474Z * [new branch] gh/PaliC/2/orig -> origin/gh/PaliC/2/orig 2025-09-07T06:13:36.7690291Z * [new branch] gh/PaliC/20/base -> origin/gh/PaliC/20/base 2025-09-07T06:13:36.7691324Z * [new branch] gh/PaliC/20/head -> origin/gh/PaliC/20/head 2025-09-07T06:13:36.7693099Z * [new branch] gh/PaliC/20/orig -> origin/gh/PaliC/20/orig 2025-09-07T06:13:36.7694819Z * [new branch] gh/PaliC/21/base -> origin/gh/PaliC/21/base 2025-09-07T06:13:36.7695938Z * [new branch] gh/PaliC/21/head -> origin/gh/PaliC/21/head 2025-09-07T06:13:36.7697357Z * [new branch] gh/PaliC/21/orig -> origin/gh/PaliC/21/orig 2025-09-07T06:13:36.7698945Z * [new branch] gh/PaliC/22/base -> origin/gh/PaliC/22/base 2025-09-07T06:13:36.7700056Z * [new branch] gh/PaliC/22/head -> origin/gh/PaliC/22/head 2025-09-07T06:13:36.7701176Z * [new branch] gh/PaliC/22/orig -> origin/gh/PaliC/22/orig 2025-09-07T06:13:36.7702769Z * [new branch] gh/PaliC/23/base -> origin/gh/PaliC/23/base 2025-09-07T06:13:36.7703862Z * [new branch] gh/PaliC/23/head -> origin/gh/PaliC/23/head 2025-09-07T06:13:36.7705246Z * [new branch] gh/PaliC/23/orig -> origin/gh/PaliC/23/orig 2025-09-07T06:13:36.7706841Z * [new branch] gh/PaliC/24/base -> origin/gh/PaliC/24/base 2025-09-07T06:13:36.7707868Z * [new branch] gh/PaliC/24/head -> origin/gh/PaliC/24/head 2025-09-07T06:13:36.7708963Z * [new branch] gh/PaliC/24/orig -> origin/gh/PaliC/24/orig 2025-09-07T06:13:36.7710973Z * [new branch] gh/PaulZhang12/17/base -> origin/gh/PaulZhang12/17/base 2025-09-07T06:13:36.7712022Z * [new branch] gh/PaulZhang12/17/head -> origin/gh/PaulZhang12/17/head 2025-09-07T06:13:36.7713788Z * [new branch] gh/PaulZhang12/20/base -> origin/gh/PaulZhang12/20/base 2025-09-07T06:13:36.7714846Z * [new branch] gh/PaulZhang12/20/head -> origin/gh/PaulZhang12/20/head 2025-09-07T06:13:36.7715924Z * [new branch] gh/PaulZhang12/20/orig -> origin/gh/PaulZhang12/20/orig 2025-09-07T06:13:36.7717550Z * [new branch] gh/PaulZhang12/21/base -> origin/gh/PaulZhang12/21/base 2025-09-07T06:13:36.7718710Z * [new branch] gh/PaulZhang12/21/head -> origin/gh/PaulZhang12/21/head 2025-09-07T06:13:36.7719804Z * [new branch] gh/PaulZhang12/21/orig -> origin/gh/PaulZhang12/21/orig 2025-09-07T06:13:36.7721490Z * [new branch] gh/PaulZhang12/22/base -> origin/gh/PaulZhang12/22/base 2025-09-07T06:13:36.7722519Z * [new branch] gh/PaulZhang12/22/head -> origin/gh/PaulZhang12/22/head 2025-09-07T06:13:36.7723600Z * [new branch] gh/PaulZhang12/22/orig -> origin/gh/PaulZhang12/22/orig 2025-09-07T06:13:36.7725209Z * [new branch] gh/PaulZhang12/23/base -> origin/gh/PaulZhang12/23/base 2025-09-07T06:13:36.7726260Z * [new branch] gh/PaulZhang12/23/head -> origin/gh/PaulZhang12/23/head 2025-09-07T06:13:36.7727344Z * [new branch] gh/PaulZhang12/23/orig -> origin/gh/PaulZhang12/23/orig 2025-09-07T06:13:36.7728846Z * [new branch] gh/PaulZhang12/24/base -> origin/gh/PaulZhang12/24/base 2025-09-07T06:13:36.7730012Z * [new branch] gh/PaulZhang12/24/head -> origin/gh/PaulZhang12/24/head 2025-09-07T06:13:36.7731046Z * [new branch] gh/PaulZhang12/24/orig -> origin/gh/PaulZhang12/24/orig 2025-09-07T06:13:36.7732905Z * [new branch] gh/PaulZhang12/25/base -> origin/gh/PaulZhang12/25/base 2025-09-07T06:13:36.7734243Z * [new branch] gh/PaulZhang12/25/head -> origin/gh/PaulZhang12/25/head 2025-09-07T06:13:36.7735383Z * [new branch] gh/PaulZhang12/25/orig -> origin/gh/PaulZhang12/25/orig 2025-09-07T06:13:36.7737422Z * [new branch] gh/SamGinzburg/11/base -> origin/gh/SamGinzburg/11/base 2025-09-07T06:13:36.7738554Z * [new branch] gh/SamGinzburg/11/head -> origin/gh/SamGinzburg/11/head 2025-09-07T06:13:36.7741146Z * [new branch] gh/Sidharth123-cpu/24/base -> origin/gh/Sidharth123-cpu/24/base 2025-09-07T06:13:36.7742585Z * [new branch] gh/Sidharth123-cpu/25/base -> origin/gh/Sidharth123-cpu/25/base 2025-09-07T06:13:36.7744046Z * [new branch] gh/Sidharth123-cpu/26/base -> origin/gh/Sidharth123-cpu/26/base 2025-09-07T06:13:36.7745810Z * [new branch] gh/Sidharth123-cpu/27/base -> origin/gh/Sidharth123-cpu/27/base 2025-09-07T06:13:36.7747755Z * [new branch] gh/StrongerXi/1/base -> origin/gh/StrongerXi/1/base 2025-09-07T06:13:36.7748830Z * [new branch] gh/StrongerXi/1/head -> origin/gh/StrongerXi/1/head 2025-09-07T06:13:36.7750501Z * [new branch] gh/StrongerXi/133/base -> origin/gh/StrongerXi/133/base 2025-09-07T06:13:36.7751660Z * [new branch] gh/StrongerXi/133/head -> origin/gh/StrongerXi/133/head 2025-09-07T06:13:36.7752723Z * [new branch] gh/StrongerXi/133/orig -> origin/gh/StrongerXi/133/orig 2025-09-07T06:13:36.7754326Z * [new branch] gh/StrongerXi/134/base -> origin/gh/StrongerXi/134/base 2025-09-07T06:13:36.7755382Z * [new branch] gh/StrongerXi/134/head -> origin/gh/StrongerXi/134/head 2025-09-07T06:13:36.7756534Z * [new branch] gh/StrongerXi/134/orig -> origin/gh/StrongerXi/134/orig 2025-09-07T06:13:36.7758132Z * [new branch] gh/StrongerXi/136/base -> origin/gh/StrongerXi/136/base 2025-09-07T06:13:36.7759158Z * [new branch] gh/StrongerXi/136/head -> origin/gh/StrongerXi/136/head 2025-09-07T06:13:36.7760314Z * [new branch] gh/StrongerXi/136/orig -> origin/gh/StrongerXi/136/orig 2025-09-07T06:13:36.7761850Z * [new branch] gh/StrongerXi/137/base -> origin/gh/StrongerXi/137/base 2025-09-07T06:13:36.7762915Z * [new branch] gh/StrongerXi/137/head -> origin/gh/StrongerXi/137/head 2025-09-07T06:13:36.7764007Z * [new branch] gh/StrongerXi/137/orig -> origin/gh/StrongerXi/137/orig 2025-09-07T06:13:36.7765622Z * [new branch] gh/StrongerXi/138/base -> origin/gh/StrongerXi/138/base 2025-09-07T06:13:36.7766834Z * [new branch] gh/StrongerXi/138/head -> origin/gh/StrongerXi/138/head 2025-09-07T06:13:36.7767905Z * [new branch] gh/StrongerXi/138/orig -> origin/gh/StrongerXi/138/orig 2025-09-07T06:13:36.7769510Z * [new branch] gh/StrongerXi/139/base -> origin/gh/StrongerXi/139/base 2025-09-07T06:13:36.7770557Z * [new branch] gh/StrongerXi/139/head -> origin/gh/StrongerXi/139/head 2025-09-07T06:13:36.7771718Z * [new branch] gh/StrongerXi/139/orig -> origin/gh/StrongerXi/139/orig 2025-09-07T06:13:36.7773597Z * [new branch] gh/StrongerXi/140/base -> origin/gh/StrongerXi/140/base 2025-09-07T06:13:36.7774666Z * [new branch] gh/StrongerXi/140/head -> origin/gh/StrongerXi/140/head 2025-09-07T06:13:36.7776089Z * [new branch] gh/StrongerXi/140/orig -> origin/gh/StrongerXi/140/orig 2025-09-07T06:13:36.7777754Z * [new branch] gh/StrongerXi/71/base -> origin/gh/StrongerXi/71/base 2025-09-07T06:13:36.7778754Z * [new branch] gh/StrongerXi/71/head -> origin/gh/StrongerXi/71/head 2025-09-07T06:13:36.7780322Z * [new branch] gh/StrongerXi/72/base -> origin/gh/StrongerXi/72/base 2025-09-07T06:13:36.7781372Z * [new branch] gh/StrongerXi/72/head -> origin/gh/StrongerXi/72/head 2025-09-07T06:13:36.7783537Z * [new branch] gh/XilunWu/133/base -> origin/gh/XilunWu/133/base 2025-09-07T06:13:36.7784565Z * [new branch] gh/XilunWu/133/head -> origin/gh/XilunWu/133/head 2025-09-07T06:13:36.7786085Z * [new branch] gh/XilunWu/133/orig -> origin/gh/XilunWu/133/orig 2025-09-07T06:13:36.7787579Z * [new branch] gh/XilunWu/139/base -> origin/gh/XilunWu/139/base 2025-09-07T06:13:36.7788656Z * [new branch] gh/XilunWu/139/head -> origin/gh/XilunWu/139/head 2025-09-07T06:13:36.7789684Z * [new branch] gh/XilunWu/139/orig -> origin/gh/XilunWu/139/orig 2025-09-07T06:13:36.7791378Z * [new branch] gh/XilunWu/143/base -> origin/gh/XilunWu/143/base 2025-09-07T06:13:36.7792947Z * [new branch] gh/XilunWu/143/head -> origin/gh/XilunWu/143/head 2025-09-07T06:13:36.7794193Z * [new branch] gh/XilunWu/143/orig -> origin/gh/XilunWu/143/orig 2025-09-07T06:13:36.7796056Z * [new branch] gh/XilunWu/144/base -> origin/gh/XilunWu/144/base 2025-09-07T06:13:36.7797165Z * [new branch] gh/XilunWu/144/head -> origin/gh/XilunWu/144/head 2025-09-07T06:13:36.7798351Z * [new branch] gh/XilunWu/144/orig -> origin/gh/XilunWu/144/orig 2025-09-07T06:13:36.7800160Z * [new branch] gh/XilunWu/145/base -> origin/gh/XilunWu/145/base 2025-09-07T06:13:36.7801220Z * [new branch] gh/XilunWu/145/head -> origin/gh/XilunWu/145/head 2025-09-07T06:13:36.7802389Z * [new branch] gh/XilunWu/145/orig -> origin/gh/XilunWu/145/orig 2025-09-07T06:13:36.7803934Z * [new branch] gh/XilunWu/146/base -> origin/gh/XilunWu/146/base 2025-09-07T06:13:36.7805105Z * [new branch] gh/XilunWu/146/head -> origin/gh/XilunWu/146/head 2025-09-07T06:13:36.7806222Z * [new branch] gh/XilunWu/146/orig -> origin/gh/XilunWu/146/orig 2025-09-07T06:13:36.7807860Z * [new branch] gh/XilunWu/147/base -> origin/gh/XilunWu/147/base 2025-09-07T06:13:36.7808939Z * [new branch] gh/XilunWu/147/head -> origin/gh/XilunWu/147/head 2025-09-07T06:13:36.7810098Z * [new branch] gh/XilunWu/147/orig -> origin/gh/XilunWu/147/orig 2025-09-07T06:13:36.7811587Z * [new branch] gh/XilunWu/148/base -> origin/gh/XilunWu/148/base 2025-09-07T06:13:36.7812689Z * [new branch] gh/XilunWu/148/head -> origin/gh/XilunWu/148/head 2025-09-07T06:13:36.7814153Z * [new branch] gh/XilunWu/148/orig -> origin/gh/XilunWu/148/orig 2025-09-07T06:13:36.7815685Z * [new branch] gh/XilunWu/149/base -> origin/gh/XilunWu/149/base 2025-09-07T06:13:36.7816769Z * [new branch] gh/XilunWu/149/head -> origin/gh/XilunWu/149/head 2025-09-07T06:13:36.7817977Z * [new branch] gh/XilunWu/149/orig -> origin/gh/XilunWu/149/orig 2025-09-07T06:13:36.7819937Z * [new branch] gh/XilunWu/150/base -> origin/gh/XilunWu/150/base 2025-09-07T06:13:36.7821075Z * [new branch] gh/XilunWu/150/head -> origin/gh/XilunWu/150/head 2025-09-07T06:13:36.7822200Z * [new branch] gh/XilunWu/150/orig -> origin/gh/XilunWu/150/orig 2025-09-07T06:13:36.7823918Z * [new branch] gh/XilunWu/151/base -> origin/gh/XilunWu/151/base 2025-09-07T06:13:36.7825293Z * [new branch] gh/XilunWu/151/head -> origin/gh/XilunWu/151/head 2025-09-07T06:13:36.7826468Z * [new branch] gh/XilunWu/151/orig -> origin/gh/XilunWu/151/orig 2025-09-07T06:13:36.7827986Z * [new branch] gh/XilunWu/152/base -> origin/gh/XilunWu/152/base 2025-09-07T06:13:36.7828940Z * [new branch] gh/XilunWu/152/head -> origin/gh/XilunWu/152/head 2025-09-07T06:13:36.7829989Z * [new branch] gh/XilunWu/152/orig -> origin/gh/XilunWu/152/orig 2025-09-07T06:13:36.7831756Z * [new branch] gh/XilunWu/153/base -> origin/gh/XilunWu/153/base 2025-09-07T06:13:36.7832828Z * [new branch] gh/XilunWu/153/head -> origin/gh/XilunWu/153/head 2025-09-07T06:13:36.7833922Z * [new branch] gh/XilunWu/153/orig -> origin/gh/XilunWu/153/orig 2025-09-07T06:13:36.7835874Z * [new branch] gh/XilunWu/160/base -> origin/gh/XilunWu/160/base 2025-09-07T06:13:36.7836878Z * [new branch] gh/XilunWu/160/head -> origin/gh/XilunWu/160/head 2025-09-07T06:13:36.7837981Z * [new branch] gh/XilunWu/160/orig -> origin/gh/XilunWu/160/orig 2025-09-07T06:13:36.7839883Z * [new branch] gh/XilunWu/161/base -> origin/gh/XilunWu/161/base 2025-09-07T06:13:36.7840909Z * [new branch] gh/XilunWu/161/head -> origin/gh/XilunWu/161/head 2025-09-07T06:13:36.7842015Z * [new branch] gh/XilunWu/161/orig -> origin/gh/XilunWu/161/orig 2025-09-07T06:13:36.7843803Z * [new branch] gh/XilunWu/163/base -> origin/gh/XilunWu/163/base 2025-09-07T06:13:36.7844883Z * [new branch] gh/XilunWu/163/head -> origin/gh/XilunWu/163/head 2025-09-07T06:13:36.7846043Z * [new branch] gh/XilunWu/163/orig -> origin/gh/XilunWu/163/orig 2025-09-07T06:13:36.7848213Z * [new branch] gh/XilunWu/164/base -> origin/gh/XilunWu/164/base 2025-09-07T06:13:36.7849426Z * [new branch] gh/XilunWu/164/head -> origin/gh/XilunWu/164/head 2025-09-07T06:13:36.7850654Z * [new branch] gh/XilunWu/164/orig -> origin/gh/XilunWu/164/orig 2025-09-07T06:13:36.7852424Z * [new branch] gh/XilunWu/165/base -> origin/gh/XilunWu/165/base 2025-09-07T06:13:36.7854106Z * [new branch] gh/XilunWu/165/head -> origin/gh/XilunWu/165/head 2025-09-07T06:13:36.7855216Z * [new branch] gh/XilunWu/165/orig -> origin/gh/XilunWu/165/orig 2025-09-07T06:13:36.7857087Z * [new branch] gh/XilunWu/166/base -> origin/gh/XilunWu/166/base 2025-09-07T06:13:36.7858234Z * [new branch] gh/XilunWu/166/head -> origin/gh/XilunWu/166/head 2025-09-07T06:13:36.7859411Z * [new branch] gh/XilunWu/166/orig -> origin/gh/XilunWu/166/orig 2025-09-07T06:13:36.7861182Z * [new branch] gh/XilunWu/167/base -> origin/gh/XilunWu/167/base 2025-09-07T06:13:36.7862371Z * [new branch] gh/XilunWu/167/head -> origin/gh/XilunWu/167/head 2025-09-07T06:13:36.7863565Z * [new branch] gh/XilunWu/167/orig -> origin/gh/XilunWu/167/orig 2025-09-07T06:13:36.7865587Z * [new branch] gh/XilunWu/168/base -> origin/gh/XilunWu/168/base 2025-09-07T06:13:36.7866607Z * [new branch] gh/XilunWu/168/head -> origin/gh/XilunWu/168/head 2025-09-07T06:13:36.7867629Z * [new branch] gh/XilunWu/168/orig -> origin/gh/XilunWu/168/orig 2025-09-07T06:13:36.7869276Z * [new branch] gh/XilunWu/169/base -> origin/gh/XilunWu/169/base 2025-09-07T06:13:36.7870331Z * [new branch] gh/XilunWu/169/head -> origin/gh/XilunWu/169/head 2025-09-07T06:13:36.7871447Z * [new branch] gh/XilunWu/169/orig -> origin/gh/XilunWu/169/orig 2025-09-07T06:13:36.7872966Z * [new branch] gh/XilunWu/170/base -> origin/gh/XilunWu/170/base 2025-09-07T06:13:36.7874097Z * [new branch] gh/XilunWu/170/head -> origin/gh/XilunWu/170/head 2025-09-07T06:13:36.7875114Z * [new branch] gh/XilunWu/170/orig -> origin/gh/XilunWu/170/orig 2025-09-07T06:13:36.7877121Z * [new branch] gh/XuehaiPan/14/base -> origin/gh/XuehaiPan/14/base 2025-09-07T06:13:36.7878257Z * [new branch] gh/XuehaiPan/14/head -> origin/gh/XuehaiPan/14/head 2025-09-07T06:13:36.7879298Z * [new branch] gh/XuehaiPan/14/orig -> origin/gh/XuehaiPan/14/orig 2025-09-07T06:13:36.7880942Z * [new branch] gh/XuehaiPan/179/base -> origin/gh/XuehaiPan/179/base 2025-09-07T06:13:36.7882026Z * [new branch] gh/XuehaiPan/179/head -> origin/gh/XuehaiPan/179/head 2025-09-07T06:13:36.7883169Z * [new branch] gh/XuehaiPan/179/orig -> origin/gh/XuehaiPan/179/orig 2025-09-07T06:13:36.7884890Z * [new branch] gh/XuehaiPan/189/base -> origin/gh/XuehaiPan/189/base 2025-09-07T06:13:36.7885895Z * [new branch] gh/XuehaiPan/189/head -> origin/gh/XuehaiPan/189/head 2025-09-07T06:13:36.7887021Z * [new branch] gh/XuehaiPan/189/orig -> origin/gh/XuehaiPan/189/orig 2025-09-07T06:13:36.7888571Z * [new branch] gh/XuehaiPan/232/base -> origin/gh/XuehaiPan/232/base 2025-09-07T06:13:36.7889627Z * [new branch] gh/XuehaiPan/232/head -> origin/gh/XuehaiPan/232/head 2025-09-07T06:13:36.7890753Z * [new branch] gh/XuehaiPan/232/orig -> origin/gh/XuehaiPan/232/orig 2025-09-07T06:13:36.7892952Z * [new branch] gh/XuehaiPan/249/base -> origin/gh/XuehaiPan/249/base 2025-09-07T06:13:36.7894691Z * [new branch] gh/XuehaiPan/249/head -> origin/gh/XuehaiPan/249/head 2025-09-07T06:13:36.7895743Z * [new branch] gh/XuehaiPan/249/orig -> origin/gh/XuehaiPan/249/orig 2025-09-07T06:13:36.7897339Z * [new branch] gh/XuehaiPan/253/base -> origin/gh/XuehaiPan/253/base 2025-09-07T06:13:36.7898454Z * [new branch] gh/XuehaiPan/253/head -> origin/gh/XuehaiPan/253/head 2025-09-07T06:13:36.7899618Z * [new branch] gh/XuehaiPan/253/orig -> origin/gh/XuehaiPan/253/orig 2025-09-07T06:13:36.7901222Z * [new branch] gh/XuehaiPan/254/base -> origin/gh/XuehaiPan/254/base 2025-09-07T06:13:36.7902302Z * [new branch] gh/XuehaiPan/254/head -> origin/gh/XuehaiPan/254/head 2025-09-07T06:13:36.7903570Z * [new branch] gh/XuehaiPan/254/orig -> origin/gh/XuehaiPan/254/orig 2025-09-07T06:13:36.7905589Z * [new branch] gh/XuehaiPan/255/base -> origin/gh/XuehaiPan/255/base 2025-09-07T06:13:36.7906633Z * [new branch] gh/XuehaiPan/255/head -> origin/gh/XuehaiPan/255/head 2025-09-07T06:13:36.7907666Z * [new branch] gh/XuehaiPan/255/orig -> origin/gh/XuehaiPan/255/orig 2025-09-07T06:13:36.7909237Z * [new branch] gh/XuehaiPan/257/base -> origin/gh/XuehaiPan/257/base 2025-09-07T06:13:36.7910281Z * [new branch] gh/XuehaiPan/257/head -> origin/gh/XuehaiPan/257/head 2025-09-07T06:13:36.7911379Z * [new branch] gh/XuehaiPan/257/orig -> origin/gh/XuehaiPan/257/orig 2025-09-07T06:13:36.7912920Z * [new branch] gh/XuehaiPan/271/base -> origin/gh/XuehaiPan/271/base 2025-09-07T06:13:36.7913951Z * [new branch] gh/XuehaiPan/271/head -> origin/gh/XuehaiPan/271/head 2025-09-07T06:13:36.7915071Z * [new branch] gh/XuehaiPan/271/orig -> origin/gh/XuehaiPan/271/orig 2025-09-07T06:13:36.7916614Z * [new branch] gh/XuehaiPan/290/base -> origin/gh/XuehaiPan/290/base 2025-09-07T06:13:36.7917723Z * [new branch] gh/XuehaiPan/290/head -> origin/gh/XuehaiPan/290/head 2025-09-07T06:13:36.7918890Z * [new branch] gh/XuehaiPan/290/orig -> origin/gh/XuehaiPan/290/orig 2025-09-07T06:13:36.7928234Z * [new branch] gh/XuehaiPan/343/base -> origin/gh/XuehaiPan/343/base 2025-09-07T06:13:36.7928916Z * [new branch] gh/XuehaiPan/343/head -> origin/gh/XuehaiPan/343/head 2025-09-07T06:13:36.7929539Z * [new branch] gh/XuehaiPan/343/orig -> origin/gh/XuehaiPan/343/orig 2025-09-07T06:13:36.7930185Z * [new branch] gh/XuehaiPan/347/base -> origin/gh/XuehaiPan/347/base 2025-09-07T06:13:36.7930821Z * [new branch] gh/XuehaiPan/347/head -> origin/gh/XuehaiPan/347/head 2025-09-07T06:13:36.7931437Z * [new branch] gh/XuehaiPan/347/orig -> origin/gh/XuehaiPan/347/orig 2025-09-07T06:13:36.7932063Z * [new branch] gh/XuehaiPan/348/base -> origin/gh/XuehaiPan/348/base 2025-09-07T06:13:36.7932775Z * [new branch] gh/XuehaiPan/348/head -> origin/gh/XuehaiPan/348/head 2025-09-07T06:13:36.7933596Z * [new branch] gh/XuehaiPan/348/orig -> origin/gh/XuehaiPan/348/orig 2025-09-07T06:13:36.7934253Z * [new branch] gh/XuehaiPan/350/base -> origin/gh/XuehaiPan/350/base 2025-09-07T06:13:36.7934981Z * [new branch] gh/XuehaiPan/350/head -> origin/gh/XuehaiPan/350/head 2025-09-07T06:13:36.7935630Z * [new branch] gh/XuehaiPan/350/orig -> origin/gh/XuehaiPan/350/orig 2025-09-07T06:13:36.7936269Z * [new branch] gh/XuehaiPan/356/base -> origin/gh/XuehaiPan/356/base 2025-09-07T06:13:36.7936940Z * [new branch] gh/XuehaiPan/356/head -> origin/gh/XuehaiPan/356/head 2025-09-07T06:13:36.7938186Z * [new branch] gh/XuehaiPan/356/orig -> origin/gh/XuehaiPan/356/orig 2025-09-07T06:13:36.7939759Z * [new branch] gh/XuehaiPan/357/base -> origin/gh/XuehaiPan/357/base 2025-09-07T06:13:36.7940864Z * [new branch] gh/XuehaiPan/357/head -> origin/gh/XuehaiPan/357/head 2025-09-07T06:13:36.7942053Z * [new branch] gh/XuehaiPan/357/orig -> origin/gh/XuehaiPan/357/orig 2025-09-07T06:13:36.7944093Z * [new branch] gh/XuehaiPan/358/base -> origin/gh/XuehaiPan/358/base 2025-09-07T06:13:36.7945306Z * [new branch] gh/XuehaiPan/358/head -> origin/gh/XuehaiPan/358/head 2025-09-07T06:13:36.7946433Z * [new branch] gh/XuehaiPan/358/orig -> origin/gh/XuehaiPan/358/orig 2025-09-07T06:13:36.7948066Z * [new branch] gh/XuehaiPan/359/base -> origin/gh/XuehaiPan/359/base 2025-09-07T06:13:36.7949157Z * [new branch] gh/XuehaiPan/359/head -> origin/gh/XuehaiPan/359/head 2025-09-07T06:13:36.7950264Z * [new branch] gh/XuehaiPan/359/orig -> origin/gh/XuehaiPan/359/orig 2025-09-07T06:13:36.7951808Z * [new branch] gh/XuehaiPan/360/base -> origin/gh/XuehaiPan/360/base 2025-09-07T06:13:36.7952962Z * [new branch] gh/XuehaiPan/360/head -> origin/gh/XuehaiPan/360/head 2025-09-07T06:13:36.7954130Z * [new branch] gh/XuehaiPan/360/orig -> origin/gh/XuehaiPan/360/orig 2025-09-07T06:13:36.7955778Z * [new branch] gh/XuehaiPan/365/base -> origin/gh/XuehaiPan/365/base 2025-09-07T06:13:36.7956852Z * [new branch] gh/XuehaiPan/365/head -> origin/gh/XuehaiPan/365/head 2025-09-07T06:13:36.7957982Z * [new branch] gh/XuehaiPan/365/orig -> origin/gh/XuehaiPan/365/orig 2025-09-07T06:13:36.7959631Z * [new branch] gh/XuehaiPan/366/base -> origin/gh/XuehaiPan/366/base 2025-09-07T06:13:36.7960697Z * [new branch] gh/XuehaiPan/366/head -> origin/gh/XuehaiPan/366/head 2025-09-07T06:13:36.7962297Z * [new branch] gh/XuehaiPan/369/base -> origin/gh/XuehaiPan/369/base 2025-09-07T06:13:36.7963411Z * [new branch] gh/XuehaiPan/369/head -> origin/gh/XuehaiPan/369/head 2025-09-07T06:13:36.7964554Z * [new branch] gh/XuehaiPan/369/orig -> origin/gh/XuehaiPan/369/orig 2025-09-07T06:13:36.7966056Z * [new branch] gh/XuehaiPan/370/base -> origin/gh/XuehaiPan/370/base 2025-09-07T06:13:36.7967156Z * [new branch] gh/XuehaiPan/370/head -> origin/gh/XuehaiPan/370/head 2025-09-07T06:13:36.7968272Z * [new branch] gh/XuehaiPan/370/orig -> origin/gh/XuehaiPan/370/orig 2025-09-07T06:13:36.7969915Z * [new branch] gh/XuehaiPan/380/base -> origin/gh/XuehaiPan/380/base 2025-09-07T06:13:36.7971270Z * [new branch] gh/XuehaiPan/380/head -> origin/gh/XuehaiPan/380/head 2025-09-07T06:13:36.7972374Z * [new branch] gh/XuehaiPan/380/orig -> origin/gh/XuehaiPan/380/orig 2025-09-07T06:13:36.7974379Z * [new branch] gh/XuehaiPan/381/base -> origin/gh/XuehaiPan/381/base 2025-09-07T06:13:36.7975458Z * [new branch] gh/XuehaiPan/381/head -> origin/gh/XuehaiPan/381/head 2025-09-07T06:13:36.7977206Z * [new branch] gh/XuehaiPan/382/base -> origin/gh/XuehaiPan/382/base 2025-09-07T06:13:36.7978343Z * [new branch] gh/XuehaiPan/382/head -> origin/gh/XuehaiPan/382/head 2025-09-07T06:13:36.7979501Z * [new branch] gh/XuehaiPan/382/orig -> origin/gh/XuehaiPan/382/orig 2025-09-07T06:13:36.7981234Z * [new branch] gh/XuehaiPan/383/base -> origin/gh/XuehaiPan/383/base 2025-09-07T06:13:36.7982360Z * [new branch] gh/XuehaiPan/383/head -> origin/gh/XuehaiPan/383/head 2025-09-07T06:13:36.7983505Z * [new branch] gh/XuehaiPan/383/orig -> origin/gh/XuehaiPan/383/orig 2025-09-07T06:13:36.7985154Z * [new branch] gh/XuehaiPan/384/base -> origin/gh/XuehaiPan/384/base 2025-09-07T06:13:36.7986332Z * [new branch] gh/XuehaiPan/384/head -> origin/gh/XuehaiPan/384/head 2025-09-07T06:13:36.7987454Z * [new branch] gh/XuehaiPan/384/orig -> origin/gh/XuehaiPan/384/orig 2025-09-07T06:13:36.7989168Z * [new branch] gh/XuehaiPan/385/base -> origin/gh/XuehaiPan/385/base 2025-09-07T06:13:36.7990206Z * [new branch] gh/XuehaiPan/385/head -> origin/gh/XuehaiPan/385/head 2025-09-07T06:13:36.7991269Z * [new branch] gh/XuehaiPan/385/orig -> origin/gh/XuehaiPan/385/orig 2025-09-07T06:13:36.7993370Z * [new branch] gh/XuehaiPan/386/base -> origin/gh/XuehaiPan/386/base 2025-09-07T06:13:36.7994418Z * [new branch] gh/XuehaiPan/386/head -> origin/gh/XuehaiPan/386/head 2025-09-07T06:13:36.7995650Z * [new branch] gh/XuehaiPan/386/orig -> origin/gh/XuehaiPan/386/orig 2025-09-07T06:13:36.7997240Z * [new branch] gh/XuehaiPan/387/base -> origin/gh/XuehaiPan/387/base 2025-09-07T06:13:36.7998369Z * [new branch] gh/XuehaiPan/387/head -> origin/gh/XuehaiPan/387/head 2025-09-07T06:13:36.7999515Z * [new branch] gh/XuehaiPan/387/orig -> origin/gh/XuehaiPan/387/orig 2025-09-07T06:13:36.8001455Z * [new branch] gh/ZainRizvi/1/base -> origin/gh/ZainRizvi/1/base 2025-09-07T06:13:36.8002552Z * [new branch] gh/ZainRizvi/1/head -> origin/gh/ZainRizvi/1/head 2025-09-07T06:13:36.8004098Z * [new branch] gh/ZainRizvi/2/base -> origin/gh/ZainRizvi/2/base 2025-09-07T06:13:36.8005208Z * [new branch] gh/ZainRizvi/2/head -> origin/gh/ZainRizvi/2/head 2025-09-07T06:13:36.8006775Z * [new branch] gh/ZainRizvi/3/base -> origin/gh/ZainRizvi/3/base 2025-09-07T06:13:36.8007796Z * [new branch] gh/ZainRizvi/3/head -> origin/gh/ZainRizvi/3/head 2025-09-07T06:13:36.8009344Z * [new branch] gh/ZainRizvi/4/base -> origin/gh/ZainRizvi/4/base 2025-09-07T06:13:36.8010432Z * [new branch] gh/ZainRizvi/4/head -> origin/gh/ZainRizvi/4/head 2025-09-07T06:13:36.8012002Z * [new branch] gh/ZainRizvi/5/base -> origin/gh/ZainRizvi/5/base 2025-09-07T06:13:36.8012999Z * [new branch] gh/ZainRizvi/5/head -> origin/gh/ZainRizvi/5/head 2025-09-07T06:13:36.8014804Z * [new branch] gh/ZainRizvi/6/base -> origin/gh/ZainRizvi/6/base 2025-09-07T06:13:36.8015892Z * [new branch] gh/ZainRizvi/6/head -> origin/gh/ZainRizvi/6/head 2025-09-07T06:13:36.8017053Z * [new branch] gh/ZainRizvi/6/orig -> origin/gh/ZainRizvi/6/orig 2025-09-07T06:13:36.8018660Z * [new branch] gh/ZainRizvi/7/base -> origin/gh/ZainRizvi/7/base 2025-09-07T06:13:36.8019748Z * [new branch] gh/ZainRizvi/7/head -> origin/gh/ZainRizvi/7/head 2025-09-07T06:13:36.8035441Z * [new branch] gh/ZainRizvi/7/orig -> origin/gh/ZainRizvi/7/orig 2025-09-07T06:13:36.8036111Z * [new branch] gh/ZainRizvi/8/base -> origin/gh/ZainRizvi/8/base 2025-09-07T06:13:36.8036750Z * [new branch] gh/ZainRizvi/8/head -> origin/gh/ZainRizvi/8/head 2025-09-07T06:13:36.8037368Z * [new branch] gh/ZainRizvi/9/base -> origin/gh/ZainRizvi/9/base 2025-09-07T06:13:36.8038143Z * [new branch] gh/ZainRizvi/9/head -> origin/gh/ZainRizvi/9/head 2025-09-07T06:13:36.8038755Z * [new branch] gh/ZainRizvi/9/orig -> origin/gh/ZainRizvi/9/orig 2025-09-07T06:13:36.8039375Z * [new branch] gh/ZhiweiYan-96/39/base -> origin/gh/ZhiweiYan-96/39/base 2025-09-07T06:13:36.8040031Z * [new branch] gh/ZhiweiYan-96/39/head -> origin/gh/ZhiweiYan-96/39/head 2025-09-07T06:13:36.8040676Z * [new branch] gh/ZhiweiYan-96/39/orig -> origin/gh/ZhiweiYan-96/39/orig 2025-09-07T06:13:36.8041304Z * [new branch] gh/ZhiweiYan-96/44/base -> origin/gh/ZhiweiYan-96/44/base 2025-09-07T06:13:36.8041954Z * [new branch] gh/ZhiweiYan-96/44/head -> origin/gh/ZhiweiYan-96/44/head 2025-09-07T06:13:36.8042603Z * [new branch] gh/ZhiweiYan-96/45/base -> origin/gh/ZhiweiYan-96/45/base 2025-09-07T06:13:36.8043250Z * [new branch] gh/ZhiweiYan-96/45/head -> origin/gh/ZhiweiYan-96/45/head 2025-09-07T06:13:36.8043898Z * [new branch] gh/ZhiweiYan-96/49/base -> origin/gh/ZhiweiYan-96/49/base 2025-09-07T06:13:36.8044530Z * [new branch] gh/ZhiweiYan-96/49/head -> origin/gh/ZhiweiYan-96/49/head 2025-09-07T06:13:36.8045176Z * [new branch] gh/ZhiweiYan-96/62/base -> origin/gh/ZhiweiYan-96/62/base 2025-09-07T06:13:36.8045804Z * [new branch] gh/ZhiweiYan-96/62/head -> origin/gh/ZhiweiYan-96/62/head 2025-09-07T06:13:36.8046457Z * [new branch] gh/ZhiweiYan-96/64/base -> origin/gh/ZhiweiYan-96/64/base 2025-09-07T06:13:36.8047102Z * [new branch] gh/ZhiweiYan-96/64/head -> origin/gh/ZhiweiYan-96/64/head 2025-09-07T06:13:36.8047736Z * [new branch] gh/ZhiweiYan-96/64/orig -> origin/gh/ZhiweiYan-96/64/orig 2025-09-07T06:13:36.8048385Z * [new branch] gh/ZhiweiYan-96/65/base -> origin/gh/ZhiweiYan-96/65/base 2025-09-07T06:13:36.8049027Z * [new branch] gh/ZhiweiYan-96/65/head -> origin/gh/ZhiweiYan-96/65/head 2025-09-07T06:13:36.8050119Z * [new branch] gh/ZhiweiYan-96/65/orig -> origin/gh/ZhiweiYan-96/65/orig 2025-09-07T06:13:36.8051735Z * [new branch] gh/ZhiweiYan-96/66/base -> origin/gh/ZhiweiYan-96/66/base 2025-09-07T06:13:36.8052884Z * [new branch] gh/ZhiweiYan-96/66/head -> origin/gh/ZhiweiYan-96/66/head 2025-09-07T06:13:36.8054711Z * [new branch] gh/ZhiweiYan-96/67/base -> origin/gh/ZhiweiYan-96/67/base 2025-09-07T06:13:36.8055757Z * [new branch] gh/ZhiweiYan-96/67/head -> origin/gh/ZhiweiYan-96/67/head 2025-09-07T06:13:36.8057278Z * [new branch] gh/ZhiweiYan-96/68/base -> origin/gh/ZhiweiYan-96/68/base 2025-09-07T06:13:36.8058388Z * [new branch] gh/ZhiweiYan-96/68/head -> origin/gh/ZhiweiYan-96/68/head 2025-09-07T06:13:36.8059469Z * [new branch] gh/ZhiweiYan-96/68/orig -> origin/gh/ZhiweiYan-96/68/orig 2025-09-07T06:13:36.8061454Z * [new branch] gh/aakhundov/1/base -> origin/gh/aakhundov/1/base 2025-09-07T06:13:36.8063200Z * [new branch] gh/aakhundov/1/head -> origin/gh/aakhundov/1/head 2025-09-07T06:13:36.8064631Z * [new branch] gh/aakhundov/2/base -> origin/gh/aakhundov/2/base 2025-09-07T06:13:36.8065771Z * [new branch] gh/aakhundov/2/head -> origin/gh/aakhundov/2/head 2025-09-07T06:13:36.8067399Z * [new branch] gh/aditew01/openblas -> origin/gh/aditew01/openblas 2025-09-07T06:13:36.8068431Z * [new branch] gh/aditew01/sbgemm -> origin/gh/aditew01/sbgemm 2025-09-07T06:13:36.8069512Z * [new branch] gh/aditew01/vecbf16 -> origin/gh/aditew01/vecbf16 2025-09-07T06:13:36.8071460Z * [new branch] gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init 2025-09-07T06:13:36.8073014Z * [new branch] gh/alexsamardzic/9/base -> origin/gh/alexsamardzic/9/base 2025-09-07T06:13:36.8074157Z * [new branch] gh/alexsamardzic/9/head -> origin/gh/alexsamardzic/9/head 2025-09-07T06:13:36.8075359Z * [new branch] gh/alexsamardzic/9/orig -> origin/gh/alexsamardzic/9/orig 2025-09-07T06:13:36.8077385Z * [new branch] gh/amjames/18/base -> origin/gh/amjames/18/base 2025-09-07T06:13:36.8078489Z * [new branch] gh/amjames/18/head -> origin/gh/amjames/18/head 2025-09-07T06:13:36.8079633Z * [new branch] gh/amjames/18/orig -> origin/gh/amjames/18/orig 2025-09-07T06:13:36.8081744Z * [new branch] gh/andrewor14/35/base -> origin/gh/andrewor14/35/base 2025-09-07T06:13:36.8082922Z * [new branch] gh/andrewor14/35/head -> origin/gh/andrewor14/35/head 2025-09-07T06:13:36.8084089Z * [new branch] gh/andrewor14/35/orig -> origin/gh/andrewor14/35/orig 2025-09-07T06:13:36.8085885Z * [new branch] gh/andrewor14/50/base -> origin/gh/andrewor14/50/base 2025-09-07T06:13:36.8087062Z * [new branch] gh/andrewor14/50/head -> origin/gh/andrewor14/50/head 2025-09-07T06:13:36.8088268Z * [new branch] gh/andrewor14/50/orig -> origin/gh/andrewor14/50/orig 2025-09-07T06:13:36.8089850Z * [new branch] gh/andrewor14/51/base -> origin/gh/andrewor14/51/base 2025-09-07T06:13:36.8091003Z * [new branch] gh/andrewor14/51/orig -> origin/gh/andrewor14/51/orig 2025-09-07T06:13:36.8093849Z * [new branch] gh/andyanwang/1/base -> origin/gh/andyanwang/1/base 2025-09-07T06:13:36.8094812Z * [new branch] gh/andyanwang/1/head -> origin/gh/andyanwang/1/head 2025-09-07T06:13:36.8096019Z * [new branch] gh/andyanwang/1/orig -> origin/gh/andyanwang/1/orig 2025-09-07T06:13:36.8097847Z * [new branch] gh/andyanwang/13/base -> origin/gh/andyanwang/13/base 2025-09-07T06:13:36.8098993Z * [new branch] gh/andyanwang/13/head -> origin/gh/andyanwang/13/head 2025-09-07T06:13:36.8100962Z * [new branch] gh/andyanwang/13/orig -> origin/gh/andyanwang/13/orig 2025-09-07T06:13:36.8102591Z * [new branch] gh/andyanwang/2/base -> origin/gh/andyanwang/2/base 2025-09-07T06:13:36.8103716Z * [new branch] gh/andyanwang/2/head -> origin/gh/andyanwang/2/head 2025-09-07T06:13:36.8105040Z * [new branch] gh/andyanwang/2/orig -> origin/gh/andyanwang/2/orig 2025-09-07T06:13:36.8106715Z * [new branch] gh/andyanwang/28/base -> origin/gh/andyanwang/28/base 2025-09-07T06:13:36.8108006Z * [new branch] gh/andyanwang/28/head -> origin/gh/andyanwang/28/head 2025-09-07T06:13:36.8109070Z * [new branch] gh/andyanwang/28/orig -> origin/gh/andyanwang/28/orig 2025-09-07T06:13:36.8110695Z * [new branch] gh/andyanwang/3/base -> origin/gh/andyanwang/3/base 2025-09-07T06:13:36.8111840Z * [new branch] gh/andyanwang/3/head -> origin/gh/andyanwang/3/head 2025-09-07T06:13:36.8112976Z * [new branch] gh/andyanwang/3/orig -> origin/gh/andyanwang/3/orig 2025-09-07T06:13:36.8114596Z * [new branch] gh/andyanwang/30/base -> origin/gh/andyanwang/30/base 2025-09-07T06:13:36.8115958Z * [new branch] gh/andyanwang/30/orig -> origin/gh/andyanwang/30/orig 2025-09-07T06:13:36.8117554Z * [new branch] gh/andyanwang/31/base -> origin/gh/andyanwang/31/base 2025-09-07T06:13:36.8118939Z * [new branch] gh/andyanwang/31/orig -> origin/gh/andyanwang/31/orig 2025-09-07T06:13:36.8120915Z * [new branch] gh/andyanwang/32/base -> origin/gh/andyanwang/32/base 2025-09-07T06:13:36.8122061Z * [new branch] gh/andyanwang/32/head -> origin/gh/andyanwang/32/head 2025-09-07T06:13:36.8123362Z * [new branch] gh/andyanwang/32/orig -> origin/gh/andyanwang/32/orig 2025-09-07T06:13:36.8125062Z * [new branch] gh/andyanwang/39/base -> origin/gh/andyanwang/39/base 2025-09-07T06:13:36.8126250Z * [new branch] gh/andyanwang/39/head -> origin/gh/andyanwang/39/head 2025-09-07T06:13:36.8127400Z * [new branch] gh/andyanwang/39/orig -> origin/gh/andyanwang/39/orig 2025-09-07T06:13:36.8129115Z * [new branch] gh/andyanwang/4/base -> origin/gh/andyanwang/4/base 2025-09-07T06:13:36.8130133Z * [new branch] gh/andyanwang/4/head -> origin/gh/andyanwang/4/head 2025-09-07T06:13:36.8131366Z * [new branch] gh/andyanwang/4/orig -> origin/gh/andyanwang/4/orig 2025-09-07T06:13:36.8133564Z * [new branch] gh/angelayi/107/base -> origin/gh/angelayi/107/base 2025-09-07T06:13:36.8134779Z * [new branch] gh/angelayi/107/head -> origin/gh/angelayi/107/head 2025-09-07T06:13:36.8136395Z * [new branch] gh/angelayi/111/base -> origin/gh/angelayi/111/base 2025-09-07T06:13:36.8137484Z * [new branch] gh/angelayi/111/head -> origin/gh/angelayi/111/head 2025-09-07T06:13:36.8138664Z * [new branch] gh/angelayi/111/orig -> origin/gh/angelayi/111/orig 2025-09-07T06:13:36.8140300Z * [new branch] gh/angelayi/112/base -> origin/gh/angelayi/112/base 2025-09-07T06:13:36.8141513Z * [new branch] gh/angelayi/112/head -> origin/gh/angelayi/112/head 2025-09-07T06:13:36.8142709Z * [new branch] gh/angelayi/112/orig -> origin/gh/angelayi/112/orig 2025-09-07T06:13:36.8144527Z * [new branch] gh/angelayi/113/base -> origin/gh/angelayi/113/base 2025-09-07T06:13:36.8145708Z * [new branch] gh/angelayi/113/head -> origin/gh/angelayi/113/head 2025-09-07T06:13:36.8146743Z * [new branch] gh/angelayi/113/orig -> origin/gh/angelayi/113/orig 2025-09-07T06:13:36.8148395Z * [new branch] gh/angelayi/114/base -> origin/gh/angelayi/114/base 2025-09-07T06:13:36.8149401Z * [new branch] gh/angelayi/114/head -> origin/gh/angelayi/114/head 2025-09-07T06:13:36.8150507Z * [new branch] gh/angelayi/114/orig -> origin/gh/angelayi/114/orig 2025-09-07T06:13:36.8152132Z * [new branch] gh/angelayi/115/base -> origin/gh/angelayi/115/base 2025-09-07T06:13:36.8153288Z * [new branch] gh/angelayi/115/head -> origin/gh/angelayi/115/head 2025-09-07T06:13:36.8154428Z * [new branch] gh/angelayi/115/orig -> origin/gh/angelayi/115/orig 2025-09-07T06:13:36.8156534Z * [new branch] gh/anijain2305/753/base -> origin/gh/anijain2305/753/base 2025-09-07T06:13:36.8157514Z * [new branch] gh/anijain2305/753/head -> origin/gh/anijain2305/753/head 2025-09-07T06:13:36.8158665Z * [new branch] gh/anijain2305/753/orig -> origin/gh/anijain2305/753/orig 2025-09-07T06:13:36.8160315Z * [new branch] gh/anijain2305/766/base -> origin/gh/anijain2305/766/base 2025-09-07T06:13:36.8161325Z * [new branch] gh/anijain2305/766/head -> origin/gh/anijain2305/766/head 2025-09-07T06:13:36.8162405Z * [new branch] gh/anijain2305/766/orig -> origin/gh/anijain2305/766/orig 2025-09-07T06:13:36.8164040Z * [new branch] gh/anijain2305/790/base -> origin/gh/anijain2305/790/base 2025-09-07T06:13:36.8165106Z * [new branch] gh/anijain2305/790/head -> origin/gh/anijain2305/790/head 2025-09-07T06:13:36.8166294Z * [new branch] gh/anijain2305/790/orig -> origin/gh/anijain2305/790/orig 2025-09-07T06:13:36.8167887Z * [new branch] gh/anijain2305/792/base -> origin/gh/anijain2305/792/base 2025-09-07T06:13:36.8168944Z * [new branch] gh/anijain2305/792/head -> origin/gh/anijain2305/792/head 2025-09-07T06:13:36.8170116Z * [new branch] gh/anijain2305/792/orig -> origin/gh/anijain2305/792/orig 2025-09-07T06:13:36.8171693Z * [new branch] gh/anijain2305/803/base -> origin/gh/anijain2305/803/base 2025-09-07T06:13:36.8172867Z * [new branch] gh/anijain2305/803/head -> origin/gh/anijain2305/803/head 2025-09-07T06:13:36.8174243Z * [new branch] gh/anijain2305/803/orig -> origin/gh/anijain2305/803/orig 2025-09-07T06:13:36.8175850Z * [new branch] gh/anijain2305/804/base -> origin/gh/anijain2305/804/base 2025-09-07T06:13:36.8176958Z * [new branch] gh/anijain2305/804/head -> origin/gh/anijain2305/804/head 2025-09-07T06:13:36.8178119Z * [new branch] gh/anijain2305/804/orig -> origin/gh/anijain2305/804/orig 2025-09-07T06:13:36.8179902Z * [new branch] gh/anijain2305/805/base -> origin/gh/anijain2305/805/base 2025-09-07T06:13:36.8181049Z * [new branch] gh/anijain2305/805/head -> origin/gh/anijain2305/805/head 2025-09-07T06:13:36.8182220Z * [new branch] gh/anijain2305/805/orig -> origin/gh/anijain2305/805/orig 2025-09-07T06:13:36.8184022Z * [new branch] gh/anijain2305/810/base -> origin/gh/anijain2305/810/base 2025-09-07T06:13:36.8185249Z * [new branch] gh/anijain2305/810/head -> origin/gh/anijain2305/810/head 2025-09-07T06:13:36.8186382Z * [new branch] gh/anijain2305/810/orig -> origin/gh/anijain2305/810/orig 2025-09-07T06:13:36.8188198Z * [new branch] gh/anijain2305/812/base -> origin/gh/anijain2305/812/base 2025-09-07T06:13:36.8189185Z * [new branch] gh/anijain2305/812/head -> origin/gh/anijain2305/812/head 2025-09-07T06:13:36.8190324Z * [new branch] gh/anijain2305/812/orig -> origin/gh/anijain2305/812/orig 2025-09-07T06:13:36.8192080Z * [new branch] gh/anijain2305/838/base -> origin/gh/anijain2305/838/base 2025-09-07T06:13:36.8195074Z * [new branch] gh/anijain2305/838/head -> origin/gh/anijain2305/838/head 2025-09-07T06:13:36.8196573Z * [new branch] gh/anijain2305/838/orig -> origin/gh/anijain2305/838/orig 2025-09-07T06:13:36.8198305Z * [new branch] gh/anijain2305/839/base -> origin/gh/anijain2305/839/base 2025-09-07T06:13:36.8199460Z * [new branch] gh/anijain2305/839/head -> origin/gh/anijain2305/839/head 2025-09-07T06:13:36.8200660Z * [new branch] gh/anijain2305/839/orig -> origin/gh/anijain2305/839/orig 2025-09-07T06:13:36.8202354Z * [new branch] gh/anijain2305/843/base -> origin/gh/anijain2305/843/base 2025-09-07T06:13:36.8203653Z * [new branch] gh/anijain2305/843/head -> origin/gh/anijain2305/843/head 2025-09-07T06:13:36.8204810Z * [new branch] gh/anijain2305/843/orig -> origin/gh/anijain2305/843/orig 2025-09-07T06:13:36.8206445Z * [new branch] gh/anijain2305/844/base -> origin/gh/anijain2305/844/base 2025-09-07T06:13:36.8207535Z * [new branch] gh/anijain2305/844/head -> origin/gh/anijain2305/844/head 2025-09-07T06:13:36.8208668Z * [new branch] gh/anijain2305/844/orig -> origin/gh/anijain2305/844/orig 2025-09-07T06:13:36.8210301Z * [new branch] gh/anijain2305/846/base -> origin/gh/anijain2305/846/base 2025-09-07T06:13:36.8211437Z * [new branch] gh/anijain2305/846/head -> origin/gh/anijain2305/846/head 2025-09-07T06:13:36.8212572Z * [new branch] gh/anijain2305/846/orig -> origin/gh/anijain2305/846/orig 2025-09-07T06:13:36.8214666Z * [new branch] gh/anijain2305/848/base -> origin/gh/anijain2305/848/base 2025-09-07T06:13:36.8215863Z * [new branch] gh/anijain2305/848/head -> origin/gh/anijain2305/848/head 2025-09-07T06:13:36.8217019Z * [new branch] gh/anijain2305/848/orig -> origin/gh/anijain2305/848/orig 2025-09-07T06:13:36.8218685Z * [new branch] gh/anijain2305/849/base -> origin/gh/anijain2305/849/base 2025-09-07T06:13:36.8219837Z * [new branch] gh/anijain2305/849/head -> origin/gh/anijain2305/849/head 2025-09-07T06:13:36.8220944Z * [new branch] gh/anijain2305/849/orig -> origin/gh/anijain2305/849/orig 2025-09-07T06:13:36.8223062Z * [new branch] gh/anijain2305/850/base -> origin/gh/anijain2305/850/base 2025-09-07T06:13:36.8224193Z * [new branch] gh/anijain2305/850/head -> origin/gh/anijain2305/850/head 2025-09-07T06:13:36.8225496Z * [new branch] gh/anijain2305/850/orig -> origin/gh/anijain2305/850/orig 2025-09-07T06:13:36.8227204Z * [new branch] gh/anijain2305/851/base -> origin/gh/anijain2305/851/base 2025-09-07T06:13:36.8228270Z * [new branch] gh/anijain2305/851/head -> origin/gh/anijain2305/851/head 2025-09-07T06:13:36.8229409Z * [new branch] gh/anijain2305/851/orig -> origin/gh/anijain2305/851/orig 2025-09-07T06:13:36.8231175Z * [new branch] gh/anijain2305/852/base -> origin/gh/anijain2305/852/base 2025-09-07T06:13:36.8232239Z * [new branch] gh/anijain2305/852/head -> origin/gh/anijain2305/852/head 2025-09-07T06:13:36.8233800Z * [new branch] gh/anijain2305/852/orig -> origin/gh/anijain2305/852/orig 2025-09-07T06:13:36.8235039Z * [new branch] gh/anijain2305/853/base -> origin/gh/anijain2305/853/base 2025-09-07T06:13:36.8236064Z * [new branch] gh/anijain2305/853/head -> origin/gh/anijain2305/853/head 2025-09-07T06:13:36.8237172Z * [new branch] gh/anijain2305/853/orig -> origin/gh/anijain2305/853/orig 2025-09-07T06:13:36.8238864Z * [new branch] gh/anijain2305/854/base -> origin/gh/anijain2305/854/base 2025-09-07T06:13:36.8240011Z * [new branch] gh/anijain2305/854/head -> origin/gh/anijain2305/854/head 2025-09-07T06:13:36.8241171Z * [new branch] gh/anijain2305/854/orig -> origin/gh/anijain2305/854/orig 2025-09-07T06:13:36.8242839Z * [new branch] gh/anijain2305/855/base -> origin/gh/anijain2305/855/base 2025-09-07T06:13:36.8243963Z * [new branch] gh/anijain2305/855/head -> origin/gh/anijain2305/855/head 2025-09-07T06:13:36.8245096Z * [new branch] gh/anijain2305/855/orig -> origin/gh/anijain2305/855/orig 2025-09-07T06:13:36.8246720Z * [new branch] gh/anijain2305/856/base -> origin/gh/anijain2305/856/base 2025-09-07T06:13:36.8247825Z * [new branch] gh/anijain2305/856/head -> origin/gh/anijain2305/856/head 2025-09-07T06:13:36.8249069Z * [new branch] gh/anijain2305/856/orig -> origin/gh/anijain2305/856/orig 2025-09-07T06:13:36.8250578Z * [new branch] gh/anijain2305/857/base -> origin/gh/anijain2305/857/base 2025-09-07T06:13:36.8251625Z * [new branch] gh/anijain2305/857/head -> origin/gh/anijain2305/857/head 2025-09-07T06:13:36.8252883Z * [new branch] gh/anijain2305/857/orig -> origin/gh/anijain2305/857/orig 2025-09-07T06:13:36.8255036Z * [new branch] gh/anijain2305/858/base -> origin/gh/anijain2305/858/base 2025-09-07T06:13:36.8256180Z * [new branch] gh/anijain2305/858/head -> origin/gh/anijain2305/858/head 2025-09-07T06:13:36.8257343Z * [new branch] gh/anijain2305/858/orig -> origin/gh/anijain2305/858/orig 2025-09-07T06:13:36.8259077Z * [new branch] gh/anijain2305/859/base -> origin/gh/anijain2305/859/base 2025-09-07T06:13:36.8260228Z * [new branch] gh/anijain2305/859/head -> origin/gh/anijain2305/859/head 2025-09-07T06:13:36.8261387Z * [new branch] gh/anijain2305/859/orig -> origin/gh/anijain2305/859/orig 2025-09-07T06:13:36.8263038Z * [new branch] gh/anijain2305/860/base -> origin/gh/anijain2305/860/base 2025-09-07T06:13:36.8264177Z * [new branch] gh/anijain2305/860/head -> origin/gh/anijain2305/860/head 2025-09-07T06:13:36.8265402Z * [new branch] gh/anijain2305/860/orig -> origin/gh/anijain2305/860/orig 2025-09-07T06:13:36.8267053Z * [new branch] gh/anijain2305/861/base -> origin/gh/anijain2305/861/base 2025-09-07T06:13:36.8268117Z * [new branch] gh/anijain2305/861/head -> origin/gh/anijain2305/861/head 2025-09-07T06:13:36.8269278Z * [new branch] gh/anijain2305/861/orig -> origin/gh/anijain2305/861/orig 2025-09-07T06:13:36.8270965Z * [new branch] gh/anijain2305/862/base -> origin/gh/anijain2305/862/base 2025-09-07T06:13:36.8272078Z * [new branch] gh/anijain2305/862/head -> origin/gh/anijain2305/862/head 2025-09-07T06:13:36.8273267Z * [new branch] gh/anijain2305/862/orig -> origin/gh/anijain2305/862/orig 2025-09-07T06:13:36.8274971Z * [new branch] gh/anijain2305/863/base -> origin/gh/anijain2305/863/base 2025-09-07T06:13:36.8276122Z * [new branch] gh/anijain2305/863/head -> origin/gh/anijain2305/863/head 2025-09-07T06:13:36.8277264Z * [new branch] gh/anijain2305/863/orig -> origin/gh/anijain2305/863/orig 2025-09-07T06:13:36.8279015Z * [new branch] gh/anijain2305/864/base -> origin/gh/anijain2305/864/base 2025-09-07T06:13:36.8280133Z * [new branch] gh/anijain2305/864/head -> origin/gh/anijain2305/864/head 2025-09-07T06:13:36.8281264Z * [new branch] gh/anijain2305/864/orig -> origin/gh/anijain2305/864/orig 2025-09-07T06:13:36.8282931Z * [new branch] gh/anijain2305/865/base -> origin/gh/anijain2305/865/base 2025-09-07T06:13:36.8284019Z * [new branch] gh/anijain2305/865/head -> origin/gh/anijain2305/865/head 2025-09-07T06:13:36.8285177Z * [new branch] gh/anijain2305/865/orig -> origin/gh/anijain2305/865/orig 2025-09-07T06:13:36.8286803Z * [new branch] gh/anijain2305/866/base -> origin/gh/anijain2305/866/base 2025-09-07T06:13:36.8287877Z * [new branch] gh/anijain2305/866/head -> origin/gh/anijain2305/866/head 2025-09-07T06:13:36.8289017Z * [new branch] gh/anijain2305/866/orig -> origin/gh/anijain2305/866/orig 2025-09-07T06:13:36.8291055Z * [new branch] gh/anjali411/216/base -> origin/gh/anjali411/216/base 2025-09-07T06:13:36.8292250Z * [new branch] gh/anjali411/216/head -> origin/gh/anjali411/216/head 2025-09-07T06:13:36.8293870Z * [new branch] gh/anjali411/216/orig -> origin/gh/anjali411/216/orig 2025-09-07T06:13:36.8296120Z * [new branch] gh/ankitageorge/13/base -> origin/gh/ankitageorge/13/base 2025-09-07T06:13:36.8297176Z * [new branch] gh/ankitageorge/13/head -> origin/gh/ankitageorge/13/head 2025-09-07T06:13:36.8298401Z * [new branch] gh/ankitageorge/13/orig -> origin/gh/ankitageorge/13/orig 2025-09-07T06:13:36.8300212Z * [new branch] gh/ankitageorge/14/base -> origin/gh/ankitageorge/14/base 2025-09-07T06:13:36.8301295Z * [new branch] gh/ankitageorge/14/head -> origin/gh/ankitageorge/14/head 2025-09-07T06:13:36.8302701Z * [new branch] gh/ankitageorge/14/orig -> origin/gh/ankitageorge/14/orig 2025-09-07T06:13:36.8304509Z * [new branch] gh/ankitageorge/15/base -> origin/gh/ankitageorge/15/base 2025-09-07T06:13:36.8305618Z * [new branch] gh/ankitageorge/15/head -> origin/gh/ankitageorge/15/head 2025-09-07T06:13:36.8306825Z * [new branch] gh/ankitageorge/15/orig -> origin/gh/ankitageorge/15/orig 2025-09-07T06:13:36.8308550Z * [new branch] gh/ankitageorge/16/base -> origin/gh/ankitageorge/16/base 2025-09-07T06:13:36.8309721Z * [new branch] gh/ankitageorge/16/head -> origin/gh/ankitageorge/16/head 2025-09-07T06:13:36.8310963Z * [new branch] gh/ankitageorge/16/orig -> origin/gh/ankitageorge/16/orig 2025-09-07T06:13:36.8312807Z * [new branch] gh/ankitageorge/17/base -> origin/gh/ankitageorge/17/base 2025-09-07T06:13:36.8313806Z * [new branch] gh/ankitageorge/17/head -> origin/gh/ankitageorge/17/head 2025-09-07T06:13:36.8314964Z * [new branch] gh/ankitageorge/17/orig -> origin/gh/ankitageorge/17/orig 2025-09-07T06:13:36.8316760Z * [new branch] gh/ankitageorge/21/base -> origin/gh/ankitageorge/21/base 2025-09-07T06:13:36.8317803Z * [new branch] gh/ankitageorge/21/head -> origin/gh/ankitageorge/21/head 2025-09-07T06:13:36.8318970Z * [new branch] gh/ankitageorge/21/orig -> origin/gh/ankitageorge/21/orig 2025-09-07T06:13:36.8321072Z * [new branch] gh/anshul-si/1/base -> origin/gh/anshul-si/1/base 2025-09-07T06:13:36.8322170Z * [new branch] gh/anshul-si/1/head -> origin/gh/anshul-si/1/head 2025-09-07T06:13:36.8323784Z * [new branch] gh/anshul-si/15/base -> origin/gh/anshul-si/15/base 2025-09-07T06:13:36.8324986Z * [new branch] gh/anshul-si/15/head -> origin/gh/anshul-si/15/head 2025-09-07T06:13:36.8326159Z * [new branch] gh/anshul-si/15/orig -> origin/gh/anshul-si/15/orig 2025-09-07T06:13:36.8328045Z * [new branch] gh/anshul-si/16/base -> origin/gh/anshul-si/16/base 2025-09-07T06:13:36.8329021Z * [new branch] gh/anshul-si/16/head -> origin/gh/anshul-si/16/head 2025-09-07T06:13:36.8330178Z * [new branch] gh/anshul-si/16/orig -> origin/gh/anshul-si/16/orig 2025-09-07T06:13:36.8331961Z * [new branch] gh/anshul-si/17/base -> origin/gh/anshul-si/17/base 2025-09-07T06:13:36.8333581Z * [new branch] gh/anshul-si/17/head -> origin/gh/anshul-si/17/head 2025-09-07T06:13:36.8334969Z * [new branch] gh/anshul-si/17/orig -> origin/gh/anshul-si/17/orig 2025-09-07T06:13:36.8336990Z * [new branch] gh/anshul-si/18/base -> origin/gh/anshul-si/18/base 2025-09-07T06:13:36.8337947Z * [new branch] gh/anshul-si/18/head -> origin/gh/anshul-si/18/head 2025-09-07T06:13:36.8339316Z * [new branch] gh/anshul-si/18/orig -> origin/gh/anshul-si/18/orig 2025-09-07T06:13:36.8341025Z * [new branch] gh/anshul-si/19/base -> origin/gh/anshul-si/19/base 2025-09-07T06:13:36.8342290Z * [new branch] gh/anshul-si/19/head -> origin/gh/anshul-si/19/head 2025-09-07T06:13:36.8343718Z * [new branch] gh/anshul-si/19/orig -> origin/gh/anshul-si/19/orig 2025-09-07T06:13:36.8345028Z * [new branch] gh/anshul-si/2/base -> origin/gh/anshul-si/2/base 2025-09-07T06:13:36.8346147Z * [new branch] gh/anshul-si/2/head -> origin/gh/anshul-si/2/head 2025-09-07T06:13:36.8348041Z * [new branch] gh/anshul-si/20/base -> origin/gh/anshul-si/20/base 2025-09-07T06:13:36.8349143Z * [new branch] gh/anshul-si/20/head -> origin/gh/anshul-si/20/head 2025-09-07T06:13:36.8350284Z * [new branch] gh/anshul-si/20/orig -> origin/gh/anshul-si/20/orig 2025-09-07T06:13:36.8351851Z * [new branch] gh/anshul-si/21/base -> origin/gh/anshul-si/21/base 2025-09-07T06:13:36.8352995Z * [new branch] gh/anshul-si/21/head -> origin/gh/anshul-si/21/head 2025-09-07T06:13:36.8354148Z * [new branch] gh/anshul-si/21/orig -> origin/gh/anshul-si/21/orig 2025-09-07T06:13:36.8355777Z * [new branch] gh/anshul-si/22/base -> origin/gh/anshul-si/22/base 2025-09-07T06:13:36.8356951Z * [new branch] gh/anshul-si/22/head -> origin/gh/anshul-si/22/head 2025-09-07T06:13:36.8358084Z * [new branch] gh/anshul-si/22/orig -> origin/gh/anshul-si/22/orig 2025-09-07T06:13:36.8359623Z * [new branch] gh/anshul-si/23/base -> origin/gh/anshul-si/23/base 2025-09-07T06:13:36.8360714Z * [new branch] gh/anshul-si/23/head -> origin/gh/anshul-si/23/head 2025-09-07T06:13:36.8361875Z * [new branch] gh/anshul-si/23/orig -> origin/gh/anshul-si/23/orig 2025-09-07T06:13:36.8363490Z * [new branch] gh/anshul-si/24/base -> origin/gh/anshul-si/24/base 2025-09-07T06:13:36.8364679Z * [new branch] gh/anshul-si/24/head -> origin/gh/anshul-si/24/head 2025-09-07T06:13:36.8365805Z * [new branch] gh/anshul-si/24/orig -> origin/gh/anshul-si/24/orig 2025-09-07T06:13:36.8367476Z * [new branch] gh/anshul-si/25/base -> origin/gh/anshul-si/25/base 2025-09-07T06:13:36.8368645Z * [new branch] gh/anshul-si/25/head -> origin/gh/anshul-si/25/head 2025-09-07T06:13:36.8369795Z * [new branch] gh/anshul-si/25/orig -> origin/gh/anshul-si/25/orig 2025-09-07T06:13:36.8371375Z * [new branch] gh/anshul-si/26/base -> origin/gh/anshul-si/26/base 2025-09-07T06:13:36.8372469Z * [new branch] gh/anshul-si/26/head -> origin/gh/anshul-si/26/head 2025-09-07T06:13:36.8373976Z * [new branch] gh/anshul-si/26/orig -> origin/gh/anshul-si/26/orig 2025-09-07T06:13:36.8375805Z * [new branch] gh/anshul-si/27/base -> origin/gh/anshul-si/27/base 2025-09-07T06:13:36.8376982Z * [new branch] gh/anshul-si/27/head -> origin/gh/anshul-si/27/head 2025-09-07T06:13:36.8378181Z * [new branch] gh/anshul-si/27/orig -> origin/gh/anshul-si/27/orig 2025-09-07T06:13:36.8379704Z * [new branch] gh/anshul-si/28/base -> origin/gh/anshul-si/28/base 2025-09-07T06:13:36.8380791Z * [new branch] gh/anshul-si/28/head -> origin/gh/anshul-si/28/head 2025-09-07T06:13:36.8382045Z * [new branch] gh/anshul-si/28/orig -> origin/gh/anshul-si/28/orig 2025-09-07T06:13:36.8383558Z * [new branch] gh/anshul-si/29/base -> origin/gh/anshul-si/29/base 2025-09-07T06:13:36.8384926Z * [new branch] gh/anshul-si/29/head -> origin/gh/anshul-si/29/head 2025-09-07T06:13:36.8386120Z * [new branch] gh/anshul-si/29/orig -> origin/gh/anshul-si/29/orig 2025-09-07T06:13:36.8387625Z * [new branch] gh/anshul-si/3/base -> origin/gh/anshul-si/3/base 2025-09-07T06:13:36.8388664Z * [new branch] gh/anshul-si/3/head -> origin/gh/anshul-si/3/head 2025-09-07T06:13:36.8390126Z * [new branch] gh/anshul-si/4/base -> origin/gh/anshul-si/4/base 2025-09-07T06:13:36.8391242Z * [new branch] gh/anshul-si/4/head -> origin/gh/anshul-si/4/head 2025-09-07T06:13:36.8393813Z * [new branch] gh/anshul-si/5/base -> origin/gh/anshul-si/5/base 2025-09-07T06:13:36.8395037Z * [new branch] gh/anshul-si/5/head -> origin/gh/anshul-si/5/head 2025-09-07T06:13:36.8397256Z * [new branch] gh/aorenste/132/base -> origin/gh/aorenste/132/base 2025-09-07T06:13:36.8398376Z * [new branch] gh/aorenste/132/head -> origin/gh/aorenste/132/head 2025-09-07T06:13:36.8400501Z * [new branch] gh/bdhirsh/650/base -> origin/gh/bdhirsh/650/base 2025-09-07T06:13:36.8401917Z * [new branch] gh/bdhirsh/650/head -> origin/gh/bdhirsh/650/head 2025-09-07T06:13:36.8403072Z * [new branch] gh/bdhirsh/650/orig -> origin/gh/bdhirsh/650/orig 2025-09-07T06:13:36.8404860Z * [new branch] gh/bdhirsh/663/base -> origin/gh/bdhirsh/663/base 2025-09-07T06:13:36.8406043Z * [new branch] gh/bdhirsh/663/head -> origin/gh/bdhirsh/663/head 2025-09-07T06:13:36.8407191Z * [new branch] gh/bdhirsh/663/orig -> origin/gh/bdhirsh/663/orig 2025-09-07T06:13:36.8408986Z * [new branch] gh/bdhirsh/665/base -> origin/gh/bdhirsh/665/base 2025-09-07T06:13:36.8410038Z * [new branch] gh/bdhirsh/665/head -> origin/gh/bdhirsh/665/head 2025-09-07T06:13:36.8411221Z * [new branch] gh/bdhirsh/665/orig -> origin/gh/bdhirsh/665/orig 2025-09-07T06:13:36.8413507Z * [new branch] gh/bdhirsh/666/base -> origin/gh/bdhirsh/666/base 2025-09-07T06:13:36.8414826Z * [new branch] gh/bdhirsh/666/head -> origin/gh/bdhirsh/666/head 2025-09-07T06:13:36.8416005Z * [new branch] gh/bdhirsh/666/orig -> origin/gh/bdhirsh/666/orig 2025-09-07T06:13:36.8418025Z * [new branch] gh/bdhirsh/667/base -> origin/gh/bdhirsh/667/base 2025-09-07T06:13:36.8419175Z * [new branch] gh/bdhirsh/667/head -> origin/gh/bdhirsh/667/head 2025-09-07T06:13:36.8420370Z * [new branch] gh/bdhirsh/667/orig -> origin/gh/bdhirsh/667/orig 2025-09-07T06:13:36.8422004Z * [new branch] gh/bdhirsh/668/base -> origin/gh/bdhirsh/668/base 2025-09-07T06:13:36.8423129Z * [new branch] gh/bdhirsh/668/head -> origin/gh/bdhirsh/668/head 2025-09-07T06:13:36.8424303Z * [new branch] gh/bdhirsh/668/orig -> origin/gh/bdhirsh/668/orig 2025-09-07T06:13:36.8426176Z * [new branch] gh/bdhirsh/669/base -> origin/gh/bdhirsh/669/base 2025-09-07T06:13:36.8427304Z * [new branch] gh/bdhirsh/669/head -> origin/gh/bdhirsh/669/head 2025-09-07T06:13:36.8428404Z * [new branch] gh/bdhirsh/669/orig -> origin/gh/bdhirsh/669/orig 2025-09-07T06:13:36.8430264Z * [new branch] gh/bdhirsh/670/base -> origin/gh/bdhirsh/670/base 2025-09-07T06:13:36.8431462Z * [new branch] gh/bdhirsh/670/head -> origin/gh/bdhirsh/670/head 2025-09-07T06:13:36.8432761Z * [new branch] gh/bdhirsh/670/orig -> origin/gh/bdhirsh/670/orig 2025-09-07T06:13:36.8434737Z * [new branch] gh/benjaminglass1/100/base -> origin/gh/benjaminglass1/100/base 2025-09-07T06:13:36.8435839Z * [new branch] gh/benjaminglass1/100/head -> origin/gh/benjaminglass1/100/head 2025-09-07T06:13:36.8437077Z * [new branch] gh/benjaminglass1/100/orig -> origin/gh/benjaminglass1/100/orig 2025-09-07T06:13:36.8438758Z * [new branch] gh/benjaminglass1/101/base -> origin/gh/benjaminglass1/101/base 2025-09-07T06:13:36.8439867Z * [new branch] gh/benjaminglass1/101/head -> origin/gh/benjaminglass1/101/head 2025-09-07T06:13:36.8441031Z * [new branch] gh/benjaminglass1/101/orig -> origin/gh/benjaminglass1/101/orig 2025-09-07T06:13:36.8442806Z * [new branch] gh/benjaminglass1/102/base -> origin/gh/benjaminglass1/102/base 2025-09-07T06:13:36.8443776Z * [new branch] gh/benjaminglass1/102/head -> origin/gh/benjaminglass1/102/head 2025-09-07T06:13:36.8444917Z * [new branch] gh/benjaminglass1/102/orig -> origin/gh/benjaminglass1/102/orig 2025-09-07T06:13:36.8446515Z * [new branch] gh/benjaminglass1/103/base -> origin/gh/benjaminglass1/103/base 2025-09-07T06:13:36.8447575Z * [new branch] gh/benjaminglass1/103/head -> origin/gh/benjaminglass1/103/head 2025-09-07T06:13:36.8448708Z * [new branch] gh/benjaminglass1/103/orig -> origin/gh/benjaminglass1/103/orig 2025-09-07T06:13:36.8450372Z * [new branch] gh/benjaminglass1/104/base -> origin/gh/benjaminglass1/104/base 2025-09-07T06:13:36.8451425Z * [new branch] gh/benjaminglass1/104/head -> origin/gh/benjaminglass1/104/head 2025-09-07T06:13:36.8452592Z * [new branch] gh/benjaminglass1/104/orig -> origin/gh/benjaminglass1/104/orig 2025-09-07T06:13:36.8454530Z * [new branch] gh/benjaminglass1/105/base -> origin/gh/benjaminglass1/105/base 2025-09-07T06:13:36.8455612Z * [new branch] gh/benjaminglass1/105/head -> origin/gh/benjaminglass1/105/head 2025-09-07T06:13:36.8456908Z * [new branch] gh/benjaminglass1/105/orig -> origin/gh/benjaminglass1/105/orig 2025-09-07T06:13:36.8458562Z * [new branch] gh/benjaminglass1/106/base -> origin/gh/benjaminglass1/106/base 2025-09-07T06:13:36.8459685Z * [new branch] gh/benjaminglass1/106/head -> origin/gh/benjaminglass1/106/head 2025-09-07T06:13:36.8460882Z * [new branch] gh/benjaminglass1/106/orig -> origin/gh/benjaminglass1/106/orig 2025-09-07T06:13:36.8462501Z * [new branch] gh/benjaminglass1/79/base -> origin/gh/benjaminglass1/79/base 2025-09-07T06:13:36.8463620Z * [new branch] gh/benjaminglass1/79/head -> origin/gh/benjaminglass1/79/head 2025-09-07T06:13:36.8465044Z * [new branch] gh/benjaminglass1/79/orig -> origin/gh/benjaminglass1/79/orig 2025-09-07T06:13:36.8466693Z * [new branch] gh/benjaminglass1/86/base -> origin/gh/benjaminglass1/86/base 2025-09-07T06:13:36.8467778Z * [new branch] gh/benjaminglass1/86/head -> origin/gh/benjaminglass1/86/head 2025-09-07T06:13:36.8469069Z * [new branch] gh/benjaminglass1/86/orig -> origin/gh/benjaminglass1/86/orig 2025-09-07T06:13:36.8470677Z * [new branch] gh/benjaminglass1/89/base -> origin/gh/benjaminglass1/89/base 2025-09-07T06:13:36.8471731Z * [new branch] gh/benjaminglass1/89/head -> origin/gh/benjaminglass1/89/head 2025-09-07T06:13:36.8472838Z * [new branch] gh/benjaminglass1/89/orig -> origin/gh/benjaminglass1/89/orig 2025-09-07T06:13:36.8474441Z * [new branch] gh/benjaminglass1/91/base -> origin/gh/benjaminglass1/91/base 2025-09-07T06:13:36.8475492Z * [new branch] gh/benjaminglass1/91/head -> origin/gh/benjaminglass1/91/head 2025-09-07T06:13:36.8476598Z * [new branch] gh/benjaminglass1/91/orig -> origin/gh/benjaminglass1/91/orig 2025-09-07T06:13:36.8478250Z * [new branch] gh/benjaminglass1/93/base -> origin/gh/benjaminglass1/93/base 2025-09-07T06:13:36.8479322Z * [new branch] gh/benjaminglass1/93/head -> origin/gh/benjaminglass1/93/head 2025-09-07T06:13:36.8480551Z * [new branch] gh/benjaminglass1/93/orig -> origin/gh/benjaminglass1/93/orig 2025-09-07T06:13:36.8482164Z * [new branch] gh/benjaminglass1/95/base -> origin/gh/benjaminglass1/95/base 2025-09-07T06:13:36.8483239Z * [new branch] gh/benjaminglass1/95/head -> origin/gh/benjaminglass1/95/head 2025-09-07T06:13:36.8484372Z * [new branch] gh/benjaminglass1/95/orig -> origin/gh/benjaminglass1/95/orig 2025-09-07T06:13:36.8486074Z * [new branch] gh/benjaminglass1/97/base -> origin/gh/benjaminglass1/97/base 2025-09-07T06:13:36.8487079Z * [new branch] gh/benjaminglass1/97/head -> origin/gh/benjaminglass1/97/head 2025-09-07T06:13:36.8488219Z * [new branch] gh/benjaminglass1/97/orig -> origin/gh/benjaminglass1/97/orig 2025-09-07T06:13:36.8489858Z * [new branch] gh/benjaminglass1/99/base -> origin/gh/benjaminglass1/99/base 2025-09-07T06:13:36.8490934Z * [new branch] gh/benjaminglass1/99/head -> origin/gh/benjaminglass1/99/head 2025-09-07T06:13:36.8494793Z * [new branch] gh/benjaminglass1/99/orig -> origin/gh/benjaminglass1/99/orig 2025-09-07T06:13:36.8496018Z * [new branch] gh/bobrenjc93/514/base -> origin/gh/bobrenjc93/514/base 2025-09-07T06:13:36.8496671Z * [new branch] gh/bobrenjc93/514/head -> origin/gh/bobrenjc93/514/head 2025-09-07T06:13:36.8497328Z * [new branch] gh/bobrenjc93/514/orig -> origin/gh/bobrenjc93/514/orig 2025-09-07T06:13:36.8498753Z * [new branch] gh/bobrenjc93/521/base -> origin/gh/bobrenjc93/521/base 2025-09-07T06:13:36.8499851Z * [new branch] gh/bobrenjc93/521/head -> origin/gh/bobrenjc93/521/head 2025-09-07T06:13:36.8501026Z * [new branch] gh/bobrenjc93/521/orig -> origin/gh/bobrenjc93/521/orig 2025-09-07T06:13:36.8502673Z * [new branch] gh/bobrenjc93/522/base -> origin/gh/bobrenjc93/522/base 2025-09-07T06:13:36.8503783Z * [new branch] gh/bobrenjc93/522/head -> origin/gh/bobrenjc93/522/head 2025-09-07T06:13:36.8505084Z * [new branch] gh/bobrenjc93/522/orig -> origin/gh/bobrenjc93/522/orig 2025-09-07T06:13:36.8506651Z * [new branch] gh/bobrenjc93/525/base -> origin/gh/bobrenjc93/525/base 2025-09-07T06:13:36.8507757Z * [new branch] gh/bobrenjc93/525/head -> origin/gh/bobrenjc93/525/head 2025-09-07T06:13:36.8508873Z * [new branch] gh/bobrenjc93/525/orig -> origin/gh/bobrenjc93/525/orig 2025-09-07T06:13:36.8510577Z * [new branch] gh/bobrenjc93/526/base -> origin/gh/bobrenjc93/526/base 2025-09-07T06:13:36.8511666Z * [new branch] gh/bobrenjc93/526/head -> origin/gh/bobrenjc93/526/head 2025-09-07T06:13:36.8512783Z * [new branch] gh/bobrenjc93/526/orig -> origin/gh/bobrenjc93/526/orig 2025-09-07T06:13:36.8514361Z * [new branch] gh/bobrenjc93/527/base -> origin/gh/bobrenjc93/527/base 2025-09-07T06:13:36.8515429Z * [new branch] gh/bobrenjc93/527/head -> origin/gh/bobrenjc93/527/head 2025-09-07T06:13:36.8516547Z * [new branch] gh/bobrenjc93/527/orig -> origin/gh/bobrenjc93/527/orig 2025-09-07T06:13:36.8518455Z * [new branch] gh/bobrenjc93/528/base -> origin/gh/bobrenjc93/528/base 2025-09-07T06:13:36.8519189Z * [new branch] gh/bobrenjc93/528/head -> origin/gh/bobrenjc93/528/head 2025-09-07T06:13:36.8520319Z * [new branch] gh/bobrenjc93/528/orig -> origin/gh/bobrenjc93/528/orig 2025-09-07T06:13:36.8521866Z * [new branch] gh/bobrenjc93/529/base -> origin/gh/bobrenjc93/529/base 2025-09-07T06:13:36.8522929Z * [new branch] gh/bobrenjc93/529/head -> origin/gh/bobrenjc93/529/head 2025-09-07T06:13:36.8524040Z * [new branch] gh/bobrenjc93/529/orig -> origin/gh/bobrenjc93/529/orig 2025-09-07T06:13:36.8525601Z * [new branch] gh/bobrenjc93/535/base -> origin/gh/bobrenjc93/535/base 2025-09-07T06:13:36.8526670Z * [new branch] gh/bobrenjc93/535/head -> origin/gh/bobrenjc93/535/head 2025-09-07T06:13:36.8527778Z * [new branch] gh/bobrenjc93/535/orig -> origin/gh/bobrenjc93/535/orig 2025-09-07T06:13:36.8529408Z * [new branch] gh/bobrenjc93/537/base -> origin/gh/bobrenjc93/537/base 2025-09-07T06:13:36.8530585Z * [new branch] gh/bobrenjc93/537/head -> origin/gh/bobrenjc93/537/head 2025-09-07T06:13:36.8532091Z * [new branch] gh/bobrenjc93/537/orig -> origin/gh/bobrenjc93/537/orig 2025-09-07T06:13:36.8534207Z * [new branch] gh/bobrenjc93/539/base -> origin/gh/bobrenjc93/539/base 2025-09-07T06:13:36.8535375Z * [new branch] gh/bobrenjc93/539/head -> origin/gh/bobrenjc93/539/head 2025-09-07T06:13:36.8536589Z * [new branch] gh/bobrenjc93/539/orig -> origin/gh/bobrenjc93/539/orig 2025-09-07T06:13:36.8538328Z * [new branch] gh/bobrenjc93/540/base -> origin/gh/bobrenjc93/540/base 2025-09-07T06:13:36.8539477Z * [new branch] gh/bobrenjc93/540/head -> origin/gh/bobrenjc93/540/head 2025-09-07T06:13:36.8540674Z * [new branch] gh/bobrenjc93/540/orig -> origin/gh/bobrenjc93/540/orig 2025-09-07T06:13:36.8542360Z * [new branch] gh/bobrenjc93/541/base -> origin/gh/bobrenjc93/541/base 2025-09-07T06:13:36.8543465Z * [new branch] gh/bobrenjc93/541/head -> origin/gh/bobrenjc93/541/head 2025-09-07T06:13:36.8544675Z * [new branch] gh/bobrenjc93/541/orig -> origin/gh/bobrenjc93/541/orig 2025-09-07T06:13:36.8546258Z * [new branch] gh/bobrenjc93/542/base -> origin/gh/bobrenjc93/542/base 2025-09-07T06:13:36.8547394Z * [new branch] gh/bobrenjc93/542/head -> origin/gh/bobrenjc93/542/head 2025-09-07T06:13:36.8548539Z * [new branch] gh/bobrenjc93/542/orig -> origin/gh/bobrenjc93/542/orig 2025-09-07T06:13:36.8550090Z * [new branch] gh/bobrenjc93/543/base -> origin/gh/bobrenjc93/543/base 2025-09-07T06:13:36.8551237Z * [new branch] gh/bobrenjc93/543/head -> origin/gh/bobrenjc93/543/head 2025-09-07T06:13:36.8552364Z * [new branch] gh/bobrenjc93/543/orig -> origin/gh/bobrenjc93/543/orig 2025-09-07T06:13:36.8553814Z * [new branch] gh/bobrenjc93/544/base -> origin/gh/bobrenjc93/544/base 2025-09-07T06:13:36.8554912Z * [new branch] gh/bobrenjc93/544/head -> origin/gh/bobrenjc93/544/head 2025-09-07T06:13:36.8556051Z * [new branch] gh/bobrenjc93/544/orig -> origin/gh/bobrenjc93/544/orig 2025-09-07T06:13:36.8557946Z * [new branch] gh/bobrenjc93/545/base -> origin/gh/bobrenjc93/545/base 2025-09-07T06:13:36.8559308Z * [new branch] gh/bobrenjc93/545/head -> origin/gh/bobrenjc93/545/head 2025-09-07T06:13:36.8560482Z * [new branch] gh/bobrenjc93/545/orig -> origin/gh/bobrenjc93/545/orig 2025-09-07T06:13:36.8562188Z * [new branch] gh/bobrenjc93/546/base -> origin/gh/bobrenjc93/546/base 2025-09-07T06:13:36.8563284Z * [new branch] gh/bobrenjc93/546/head -> origin/gh/bobrenjc93/546/head 2025-09-07T06:13:36.8564397Z * [new branch] gh/bobrenjc93/546/orig -> origin/gh/bobrenjc93/546/orig 2025-09-07T06:13:36.8566695Z * [new branch] gh/bobrenjc93/547/base -> origin/gh/bobrenjc93/547/base 2025-09-07T06:13:36.8567817Z * [new branch] gh/bobrenjc93/547/head -> origin/gh/bobrenjc93/547/head 2025-09-07T06:13:36.8568987Z * [new branch] gh/bobrenjc93/547/orig -> origin/gh/bobrenjc93/547/orig 2025-09-07T06:13:36.8570509Z * [new branch] gh/bobrenjc93/548/base -> origin/gh/bobrenjc93/548/base 2025-09-07T06:13:36.8571528Z * [new branch] gh/bobrenjc93/548/head -> origin/gh/bobrenjc93/548/head 2025-09-07T06:13:36.8572720Z * [new branch] gh/bobrenjc93/548/orig -> origin/gh/bobrenjc93/548/orig 2025-09-07T06:13:36.8574540Z * [new branch] gh/bobrenjc93/549/base -> origin/gh/bobrenjc93/549/base 2025-09-07T06:13:36.8575768Z * [new branch] gh/bobrenjc93/549/head -> origin/gh/bobrenjc93/549/head 2025-09-07T06:13:36.8576963Z * [new branch] gh/bobrenjc93/549/orig -> origin/gh/bobrenjc93/549/orig 2025-09-07T06:13:36.8579087Z * [new branch] gh/bobrenjc93/550/base -> origin/gh/bobrenjc93/550/base 2025-09-07T06:13:36.8580114Z * [new branch] gh/bobrenjc93/550/head -> origin/gh/bobrenjc93/550/head 2025-09-07T06:13:36.8581312Z * [new branch] gh/bobrenjc93/550/orig -> origin/gh/bobrenjc93/550/orig 2025-09-07T06:13:36.8583259Z * [new branch] gh/bobrenjc93/551/base -> origin/gh/bobrenjc93/551/base 2025-09-07T06:13:36.8584402Z * [new branch] gh/bobrenjc93/551/head -> origin/gh/bobrenjc93/551/head 2025-09-07T06:13:36.8585680Z * [new branch] gh/bobrenjc93/551/orig -> origin/gh/bobrenjc93/551/orig 2025-09-07T06:13:36.8587378Z * [new branch] gh/bobrenjc93/552/base -> origin/gh/bobrenjc93/552/base 2025-09-07T06:13:36.8588569Z * [new branch] gh/bobrenjc93/552/head -> origin/gh/bobrenjc93/552/head 2025-09-07T06:13:36.8589691Z * [new branch] gh/bobrenjc93/552/orig -> origin/gh/bobrenjc93/552/orig 2025-09-07T06:13:36.8591196Z * [new branch] gh/bobrenjc93/553/base -> origin/gh/bobrenjc93/553/base 2025-09-07T06:13:36.8592757Z * [new branch] gh/bobrenjc93/553/head -> origin/gh/bobrenjc93/553/head 2025-09-07T06:13:36.8594035Z * [new branch] gh/bobrenjc93/553/orig -> origin/gh/bobrenjc93/553/orig 2025-09-07T06:13:36.8595585Z * [new branch] gh/bobrenjc93/554/base -> origin/gh/bobrenjc93/554/base 2025-09-07T06:13:36.8596713Z * [new branch] gh/bobrenjc93/554/head -> origin/gh/bobrenjc93/554/head 2025-09-07T06:13:36.8597880Z * [new branch] gh/bobrenjc93/554/orig -> origin/gh/bobrenjc93/554/orig 2025-09-07T06:13:36.8599641Z * [new branch] gh/bobrenjc93/555/base -> origin/gh/bobrenjc93/555/base 2025-09-07T06:13:36.8600688Z * [new branch] gh/bobrenjc93/555/head -> origin/gh/bobrenjc93/555/head 2025-09-07T06:13:36.8602088Z * [new branch] gh/bobrenjc93/555/orig -> origin/gh/bobrenjc93/555/orig 2025-09-07T06:13:36.8603723Z * [new branch] gh/bobrenjc93/556/base -> origin/gh/bobrenjc93/556/base 2025-09-07T06:13:36.8604937Z * [new branch] gh/bobrenjc93/556/head -> origin/gh/bobrenjc93/556/head 2025-09-07T06:13:36.8606080Z * [new branch] gh/bobrenjc93/556/orig -> origin/gh/bobrenjc93/556/orig 2025-09-07T06:13:36.8608043Z * [new branch] gh/briancoutinho/2/base -> origin/gh/briancoutinho/2/base 2025-09-07T06:13:36.8609183Z * [new branch] gh/briancoutinho/2/head -> origin/gh/briancoutinho/2/head 2025-09-07T06:13:36.8611141Z * [new branch] gh/c00w/23/base -> origin/gh/c00w/23/base 2025-09-07T06:13:36.8612257Z * [new branch] gh/c00w/23/head -> origin/gh/c00w/23/head 2025-09-07T06:13:36.8614237Z * [new branch] gh/c00w/48/base -> origin/gh/c00w/48/base 2025-09-07T06:13:36.8615376Z * [new branch] gh/c00w/48/head -> origin/gh/c00w/48/head 2025-09-07T06:13:36.8616675Z * [new branch] gh/c00w/48/orig -> origin/gh/c00w/48/orig 2025-09-07T06:13:36.8618358Z * [new branch] gh/c00w/53/base -> origin/gh/c00w/53/base 2025-09-07T06:13:36.8619440Z * [new branch] gh/c00w/53/head -> origin/gh/c00w/53/head 2025-09-07T06:13:36.8620571Z * [new branch] gh/c00w/53/orig -> origin/gh/c00w/53/orig 2025-09-07T06:13:36.8622078Z * [new branch] gh/c00w/54/base -> origin/gh/c00w/54/base 2025-09-07T06:13:36.8623220Z * [new branch] gh/c00w/54/head -> origin/gh/c00w/54/head 2025-09-07T06:13:36.8624770Z * [new branch] gh/c00w/54/orig -> origin/gh/c00w/54/orig 2025-09-07T06:13:36.8626186Z * [new branch] gh/c00w/55/base -> origin/gh/c00w/55/base 2025-09-07T06:13:36.8627614Z * [new branch] gh/c00w/55/head -> origin/gh/c00w/55/head 2025-09-07T06:13:36.8628595Z * [new branch] gh/c00w/55/orig -> origin/gh/c00w/55/orig 2025-09-07T06:13:36.8630084Z * [new branch] gh/c00w/56/base -> origin/gh/c00w/56/base 2025-09-07T06:13:36.8631233Z * [new branch] gh/c00w/56/head -> origin/gh/c00w/56/head 2025-09-07T06:13:36.8632474Z * [new branch] gh/c00w/56/orig -> origin/gh/c00w/56/orig 2025-09-07T06:13:36.8634347Z * [new branch] gh/clee2000/1/base -> origin/gh/clee2000/1/base 2025-09-07T06:13:36.8635630Z * [new branch] gh/clee2000/1/head -> origin/gh/clee2000/1/head 2025-09-07T06:13:36.8636779Z * [new branch] gh/clee2000/1/orig -> origin/gh/clee2000/1/orig 2025-09-07T06:13:36.8638806Z * [new branch] gh/coconutruben/1/base -> origin/gh/coconutruben/1/base 2025-09-07T06:13:36.8640016Z * [new branch] gh/coconutruben/1/head -> origin/gh/coconutruben/1/head 2025-09-07T06:13:36.8641863Z * [new branch] gh/coconutruben/11/base -> origin/gh/coconutruben/11/base 2025-09-07T06:13:36.8643076Z * [new branch] gh/coconutruben/11/head -> origin/gh/coconutruben/11/head 2025-09-07T06:13:36.8644317Z * [new branch] gh/coconutruben/11/orig -> origin/gh/coconutruben/11/orig 2025-09-07T06:13:36.8646788Z * [new branch] gh/coconutruben/12/base -> origin/gh/coconutruben/12/base 2025-09-07T06:13:36.8648302Z * [new branch] gh/coconutruben/12/head -> origin/gh/coconutruben/12/head 2025-09-07T06:13:36.8649767Z * [new branch] gh/coconutruben/12/orig -> origin/gh/coconutruben/12/orig 2025-09-07T06:13:36.8651452Z * [new branch] gh/coconutruben/13/base -> origin/gh/coconutruben/13/base 2025-09-07T06:13:36.8652723Z * [new branch] gh/coconutruben/13/head -> origin/gh/coconutruben/13/head 2025-09-07T06:13:36.8654370Z * [new branch] gh/coconutruben/13/orig -> origin/gh/coconutruben/13/orig 2025-09-07T06:13:36.8656031Z * [new branch] gh/coconutruben/14/base -> origin/gh/coconutruben/14/base 2025-09-07T06:13:36.8657293Z * [new branch] gh/coconutruben/14/head -> origin/gh/coconutruben/14/head 2025-09-07T06:13:36.8658493Z * [new branch] gh/coconutruben/14/orig -> origin/gh/coconutruben/14/orig 2025-09-07T06:13:36.8660432Z * [new branch] gh/coconutruben/15/base -> origin/gh/coconutruben/15/base 2025-09-07T06:13:36.8661731Z * [new branch] gh/coconutruben/15/head -> origin/gh/coconutruben/15/head 2025-09-07T06:13:36.8663059Z * [new branch] gh/coconutruben/15/orig -> origin/gh/coconutruben/15/orig 2025-09-07T06:13:36.8664678Z * [new branch] gh/coconutruben/16/base -> origin/gh/coconutruben/16/base 2025-09-07T06:13:36.8665893Z * [new branch] gh/coconutruben/16/head -> origin/gh/coconutruben/16/head 2025-09-07T06:13:36.8667017Z * [new branch] gh/coconutruben/16/orig -> origin/gh/coconutruben/16/orig 2025-09-07T06:13:36.8668853Z * [new branch] gh/coconutruben/17/base -> origin/gh/coconutruben/17/base 2025-09-07T06:13:36.8670319Z * [new branch] gh/coconutruben/17/head -> origin/gh/coconutruben/17/head 2025-09-07T06:13:36.8671475Z * [new branch] gh/coconutruben/17/orig -> origin/gh/coconutruben/17/orig 2025-09-07T06:13:36.8673183Z * [new branch] gh/coconutruben/18/base -> origin/gh/coconutruben/18/base 2025-09-07T06:13:36.8674381Z * [new branch] gh/coconutruben/18/head -> origin/gh/coconutruben/18/head 2025-09-07T06:13:36.8675720Z * [new branch] gh/coconutruben/18/orig -> origin/gh/coconutruben/18/orig 2025-09-07T06:13:36.8677541Z * [new branch] gh/coconutruben/19/base -> origin/gh/coconutruben/19/base 2025-09-07T06:13:36.8678847Z * [new branch] gh/coconutruben/19/head -> origin/gh/coconutruben/19/head 2025-09-07T06:13:36.8679965Z * [new branch] gh/coconutruben/19/orig -> origin/gh/coconutruben/19/orig 2025-09-07T06:13:36.8681767Z * [new branch] gh/coconutruben/20/base -> origin/gh/coconutruben/20/base 2025-09-07T06:13:36.8682984Z * [new branch] gh/coconutruben/20/head -> origin/gh/coconutruben/20/head 2025-09-07T06:13:36.8684213Z * [new branch] gh/coconutruben/20/orig -> origin/gh/coconutruben/20/orig 2025-09-07T06:13:36.8685959Z * [new branch] gh/coconutruben/21/base -> origin/gh/coconutruben/21/base 2025-09-07T06:13:36.8687019Z * [new branch] gh/coconutruben/21/head -> origin/gh/coconutruben/21/head 2025-09-07T06:13:36.8688158Z * [new branch] gh/coconutruben/21/orig -> origin/gh/coconutruben/21/orig 2025-09-07T06:13:36.8689854Z * [new branch] gh/coconutruben/22/base -> origin/gh/coconutruben/22/base 2025-09-07T06:13:36.8690943Z * [new branch] gh/coconutruben/22/head -> origin/gh/coconutruben/22/head 2025-09-07T06:13:36.8692496Z * [new branch] gh/coconutruben/22/orig -> origin/gh/coconutruben/22/orig 2025-09-07T06:13:36.8694762Z * [new branch] gh/coconutruben/24/base -> origin/gh/coconutruben/24/base 2025-09-07T06:13:36.8696052Z * [new branch] gh/coconutruben/24/head -> origin/gh/coconutruben/24/head 2025-09-07T06:13:36.8697309Z * [new branch] gh/coconutruben/24/orig -> origin/gh/coconutruben/24/orig 2025-09-07T06:13:36.8699452Z * [new branch] gh/coconutruben/25/base -> origin/gh/coconutruben/25/base 2025-09-07T06:13:36.8701158Z * [new branch] gh/coconutruben/25/head -> origin/gh/coconutruben/25/head 2025-09-07T06:13:36.8702711Z * [new branch] gh/coconutruben/25/orig -> origin/gh/coconutruben/25/orig 2025-09-07T06:13:36.8704511Z * [new branch] gh/coconutruben/28/base -> origin/gh/coconutruben/28/base 2025-09-07T06:13:36.8705652Z * [new branch] gh/coconutruben/28/head -> origin/gh/coconutruben/28/head 2025-09-07T06:13:36.8706848Z * [new branch] gh/coconutruben/28/orig -> origin/gh/coconutruben/28/orig 2025-09-07T06:13:36.8708614Z * [new branch] gh/coconutruben/29/base -> origin/gh/coconutruben/29/base 2025-09-07T06:13:36.8709803Z * [new branch] gh/coconutruben/29/head -> origin/gh/coconutruben/29/head 2025-09-07T06:13:36.8711047Z * [new branch] gh/coconutruben/29/orig -> origin/gh/coconutruben/29/orig 2025-09-07T06:13:36.8712770Z * [new branch] gh/coconutruben/30/base -> origin/gh/coconutruben/30/base 2025-09-07T06:13:36.8713947Z * [new branch] gh/coconutruben/30/head -> origin/gh/coconutruben/30/head 2025-09-07T06:13:36.8715161Z * [new branch] gh/coconutruben/30/orig -> origin/gh/coconutruben/30/orig 2025-09-07T06:13:36.8717408Z * [new branch] gh/coconutruben/31/base -> origin/gh/coconutruben/31/base 2025-09-07T06:13:36.8718641Z * [new branch] gh/coconutruben/31/head -> origin/gh/coconutruben/31/head 2025-09-07T06:13:36.8719852Z * [new branch] gh/coconutruben/31/orig -> origin/gh/coconutruben/31/orig 2025-09-07T06:13:36.8721787Z * [new branch] gh/coconutruben/32/base -> origin/gh/coconutruben/32/base 2025-09-07T06:13:36.8722982Z * [new branch] gh/coconutruben/32/head -> origin/gh/coconutruben/32/head 2025-09-07T06:13:36.8724218Z * [new branch] gh/coconutruben/32/orig -> origin/gh/coconutruben/32/orig 2025-09-07T06:13:36.8726146Z * [new branch] gh/coconutruben/33/base -> origin/gh/coconutruben/33/base 2025-09-07T06:13:36.8727287Z * [new branch] gh/coconutruben/33/head -> origin/gh/coconutruben/33/head 2025-09-07T06:13:36.8728656Z * [new branch] gh/coconutruben/33/orig -> origin/gh/coconutruben/33/orig 2025-09-07T06:13:36.8730249Z * [new branch] gh/coconutruben/34/base -> origin/gh/coconutruben/34/base 2025-09-07T06:13:36.8731355Z * [new branch] gh/coconutruben/34/head -> origin/gh/coconutruben/34/head 2025-09-07T06:13:36.8732443Z * [new branch] gh/coconutruben/34/orig -> origin/gh/coconutruben/34/orig 2025-09-07T06:13:36.8734508Z * [new branch] gh/coconutruben/35/base -> origin/gh/coconutruben/35/base 2025-09-07T06:13:36.8735717Z * [new branch] gh/coconutruben/35/head -> origin/gh/coconutruben/35/head 2025-09-07T06:13:36.8736961Z * [new branch] gh/coconutruben/35/orig -> origin/gh/coconutruben/35/orig 2025-09-07T06:13:36.8740326Z * [new branch] gh/coconutruben/36/base -> origin/gh/coconutruben/36/base 2025-09-07T06:13:36.8742127Z * [new branch] gh/coconutruben/36/head -> origin/gh/coconutruben/36/head 2025-09-07T06:13:36.8744265Z * [new branch] gh/coconutruben/36/orig -> origin/gh/coconutruben/36/orig 2025-09-07T06:13:36.8746360Z * [new branch] gh/coconutruben/37/base -> origin/gh/coconutruben/37/base 2025-09-07T06:13:36.8747515Z * [new branch] gh/coconutruben/37/head -> origin/gh/coconutruben/37/head 2025-09-07T06:13:36.8748705Z * [new branch] gh/coconutruben/37/orig -> origin/gh/coconutruben/37/orig 2025-09-07T06:13:36.8750439Z * [new branch] gh/coconutruben/38/base -> origin/gh/coconutruben/38/base 2025-09-07T06:13:36.8751977Z * [new branch] gh/coconutruben/38/head -> origin/gh/coconutruben/38/head 2025-09-07T06:13:36.8753179Z * [new branch] gh/coconutruben/38/orig -> origin/gh/coconutruben/38/orig 2025-09-07T06:13:36.8754976Z * [new branch] gh/coconutruben/39/base -> origin/gh/coconutruben/39/base 2025-09-07T06:13:36.8756082Z * [new branch] gh/coconutruben/39/head -> origin/gh/coconutruben/39/head 2025-09-07T06:13:36.8757290Z * [new branch] gh/coconutruben/39/orig -> origin/gh/coconutruben/39/orig 2025-09-07T06:13:36.8759151Z * [new branch] gh/coconutruben/40/base -> origin/gh/coconutruben/40/base 2025-09-07T06:13:36.8760289Z * [new branch] gh/coconutruben/40/head -> origin/gh/coconutruben/40/head 2025-09-07T06:13:36.8761443Z * [new branch] gh/coconutruben/40/orig -> origin/gh/coconutruben/40/orig 2025-09-07T06:13:36.8763349Z * [new branch] gh/coconutruben/41/base -> origin/gh/coconutruben/41/base 2025-09-07T06:13:36.8764557Z * [new branch] gh/coconutruben/41/head -> origin/gh/coconutruben/41/head 2025-09-07T06:13:36.8765747Z * [new branch] gh/coconutruben/41/orig -> origin/gh/coconutruben/41/orig 2025-09-07T06:13:36.8767592Z * [new branch] gh/coconutruben/42/base -> origin/gh/coconutruben/42/base 2025-09-07T06:13:36.8768846Z * [new branch] gh/coconutruben/42/head -> origin/gh/coconutruben/42/head 2025-09-07T06:13:36.8770056Z * [new branch] gh/coconutruben/42/orig -> origin/gh/coconutruben/42/orig 2025-09-07T06:13:36.8771886Z * [new branch] gh/coconutruben/43/base -> origin/gh/coconutruben/43/base 2025-09-07T06:13:36.8773422Z * [new branch] gh/coconutruben/43/head -> origin/gh/coconutruben/43/head 2025-09-07T06:13:36.8774723Z * [new branch] gh/coconutruben/43/orig -> origin/gh/coconutruben/43/orig 2025-09-07T06:13:36.8776774Z * [new branch] gh/coconutruben/44/base -> origin/gh/coconutruben/44/base 2025-09-07T06:13:36.8778067Z * [new branch] gh/coconutruben/44/head -> origin/gh/coconutruben/44/head 2025-09-07T06:13:36.8779380Z * [new branch] gh/coconutruben/44/orig -> origin/gh/coconutruben/44/orig 2025-09-07T06:13:36.8781374Z * [new branch] gh/coconutruben/45/base -> origin/gh/coconutruben/45/base 2025-09-07T06:13:36.8782575Z * [new branch] gh/coconutruben/45/head -> origin/gh/coconutruben/45/head 2025-09-07T06:13:36.8783852Z * [new branch] gh/coconutruben/45/orig -> origin/gh/coconutruben/45/orig 2025-09-07T06:13:36.8785627Z * [new branch] gh/coconutruben/46/base -> origin/gh/coconutruben/46/base 2025-09-07T06:13:36.8786830Z * [new branch] gh/coconutruben/46/head -> origin/gh/coconutruben/46/head 2025-09-07T06:13:36.8788074Z * [new branch] gh/coconutruben/46/orig -> origin/gh/coconutruben/46/orig 2025-09-07T06:13:36.8789935Z * [new branch] gh/coconutruben/47/base -> origin/gh/coconutruben/47/base 2025-09-07T06:13:36.8791148Z * [new branch] gh/coconutruben/47/head -> origin/gh/coconutruben/47/head 2025-09-07T06:13:36.8793031Z * [new branch] gh/coconutruben/47/orig -> origin/gh/coconutruben/47/orig 2025-09-07T06:13:36.8795500Z * [new branch] gh/coconutruben/48/base -> origin/gh/coconutruben/48/base 2025-09-07T06:13:36.8796774Z * [new branch] gh/coconutruben/48/head -> origin/gh/coconutruben/48/head 2025-09-07T06:13:36.8798027Z * [new branch] gh/coconutruben/48/orig -> origin/gh/coconutruben/48/orig 2025-09-07T06:13:36.8800069Z * [new branch] gh/coconutruben/49/base -> origin/gh/coconutruben/49/base 2025-09-07T06:13:36.8801317Z * [new branch] gh/coconutruben/49/head -> origin/gh/coconutruben/49/head 2025-09-07T06:13:36.8802537Z * [new branch] gh/coconutruben/49/orig -> origin/gh/coconutruben/49/orig 2025-09-07T06:13:36.8804456Z * [new branch] gh/coconutruben/50/base -> origin/gh/coconutruben/50/base 2025-09-07T06:13:36.8805722Z * [new branch] gh/coconutruben/50/head -> origin/gh/coconutruben/50/head 2025-09-07T06:13:36.8806980Z * [new branch] gh/coconutruben/50/orig -> origin/gh/coconutruben/50/orig 2025-09-07T06:13:36.8808578Z * [new branch] gh/coconutruben/51/base -> origin/gh/coconutruben/51/base 2025-09-07T06:13:36.8809786Z * [new branch] gh/coconutruben/51/head -> origin/gh/coconutruben/51/head 2025-09-07T06:13:36.8811035Z * [new branch] gh/coconutruben/51/orig -> origin/gh/coconutruben/51/orig 2025-09-07T06:13:36.8812977Z * [new branch] gh/coconutruben/52/base -> origin/gh/coconutruben/52/base 2025-09-07T06:13:36.8814542Z * [new branch] gh/coconutruben/52/head -> origin/gh/coconutruben/52/head 2025-09-07T06:13:36.8815871Z * [new branch] gh/coconutruben/52/orig -> origin/gh/coconutruben/52/orig 2025-09-07T06:13:36.8817616Z * [new branch] gh/coconutruben/53/base -> origin/gh/coconutruben/53/base 2025-09-07T06:13:36.8818750Z * [new branch] gh/coconutruben/53/head -> origin/gh/coconutruben/53/head 2025-09-07T06:13:36.8819978Z * [new branch] gh/coconutruben/53/orig -> origin/gh/coconutruben/53/orig 2025-09-07T06:13:36.8821722Z * [new branch] gh/coconutruben/54/base -> origin/gh/coconutruben/54/base 2025-09-07T06:13:36.8823070Z * [new branch] gh/coconutruben/54/head -> origin/gh/coconutruben/54/head 2025-09-07T06:13:36.8824306Z * [new branch] gh/coconutruben/54/orig -> origin/gh/coconutruben/54/orig 2025-09-07T06:13:36.8826164Z * [new branch] gh/coconutruben/55/base -> origin/gh/coconutruben/55/base 2025-09-07T06:13:36.8827485Z * [new branch] gh/coconutruben/55/head -> origin/gh/coconutruben/55/head 2025-09-07T06:13:36.8828727Z * [new branch] gh/coconutruben/55/orig -> origin/gh/coconutruben/55/orig 2025-09-07T06:13:36.8830493Z * [new branch] gh/coconutruben/56/base -> origin/gh/coconutruben/56/base 2025-09-07T06:13:36.8831835Z * [new branch] gh/coconutruben/56/head -> origin/gh/coconutruben/56/head 2025-09-07T06:13:36.8832917Z * [new branch] gh/coconutruben/56/orig -> origin/gh/coconutruben/56/orig 2025-09-07T06:13:36.8834660Z * [new branch] gh/coconutruben/57/base -> origin/gh/coconutruben/57/base 2025-09-07T06:13:36.8835987Z * [new branch] gh/coconutruben/57/head -> origin/gh/coconutruben/57/head 2025-09-07T06:13:36.8837219Z * [new branch] gh/coconutruben/57/orig -> origin/gh/coconutruben/57/orig 2025-09-07T06:13:36.8839256Z * [new branch] gh/coconutruben/58/base -> origin/gh/coconutruben/58/base 2025-09-07T06:13:36.8840578Z * [new branch] gh/coconutruben/58/head -> origin/gh/coconutruben/58/head 2025-09-07T06:13:36.8841760Z * [new branch] gh/coconutruben/58/orig -> origin/gh/coconutruben/58/orig 2025-09-07T06:13:36.8843413Z * [new branch] gh/coconutruben/59/base -> origin/gh/coconutruben/59/base 2025-09-07T06:13:36.8844520Z * [new branch] gh/coconutruben/59/head -> origin/gh/coconutruben/59/head 2025-09-07T06:13:36.8845615Z * [new branch] gh/coconutruben/59/orig -> origin/gh/coconutruben/59/orig 2025-09-07T06:13:36.8847257Z * [new branch] gh/coconutruben/60/base -> origin/gh/coconutruben/60/base 2025-09-07T06:13:36.8848509Z * [new branch] gh/coconutruben/60/head -> origin/gh/coconutruben/60/head 2025-09-07T06:13:36.8849779Z * [new branch] gh/coconutruben/60/orig -> origin/gh/coconutruben/60/orig 2025-09-07T06:13:36.8851452Z * [new branch] gh/coconutruben/61/base -> origin/gh/coconutruben/61/base 2025-09-07T06:13:36.8852812Z * [new branch] gh/coconutruben/61/head -> origin/gh/coconutruben/61/head 2025-09-07T06:13:36.8854344Z * [new branch] gh/coconutruben/61/orig -> origin/gh/coconutruben/61/orig 2025-09-07T06:13:36.8856155Z * [new branch] gh/coconutruben/62/base -> origin/gh/coconutruben/62/base 2025-09-07T06:13:36.8857392Z * [new branch] gh/coconutruben/62/head -> origin/gh/coconutruben/62/head 2025-09-07T06:13:36.8858699Z * [new branch] gh/coconutruben/62/orig -> origin/gh/coconutruben/62/orig 2025-09-07T06:13:36.8860567Z * [new branch] gh/coconutruben/63/base -> origin/gh/coconutruben/63/base 2025-09-07T06:13:36.8861810Z * [new branch] gh/coconutruben/63/head -> origin/gh/coconutruben/63/head 2025-09-07T06:13:36.8863016Z * [new branch] gh/coconutruben/63/orig -> origin/gh/coconutruben/63/orig 2025-09-07T06:13:36.8864743Z * [new branch] gh/coconutruben/64/base -> origin/gh/coconutruben/64/base 2025-09-07T06:13:36.8866121Z * [new branch] gh/coconutruben/64/head -> origin/gh/coconutruben/64/head 2025-09-07T06:13:36.8867380Z * [new branch] gh/coconutruben/64/orig -> origin/gh/coconutruben/64/orig 2025-09-07T06:13:36.8869077Z * [new branch] gh/coconutruben/65/base -> origin/gh/coconutruben/65/base 2025-09-07T06:13:36.8870283Z * [new branch] gh/coconutruben/65/head -> origin/gh/coconutruben/65/head 2025-09-07T06:13:36.8871483Z * [new branch] gh/coconutruben/65/orig -> origin/gh/coconutruben/65/orig 2025-09-07T06:13:36.8873315Z * [new branch] gh/coconutruben/66/base -> origin/gh/coconutruben/66/base 2025-09-07T06:13:36.8874201Z * [new branch] gh/coconutruben/66/head -> origin/gh/coconutruben/66/head 2025-09-07T06:13:36.8875374Z * [new branch] gh/coconutruben/66/orig -> origin/gh/coconutruben/66/orig 2025-09-07T06:13:36.8877889Z * [new branch] gh/codingwithsurya/12/base -> origin/gh/codingwithsurya/12/base 2025-09-07T06:13:36.8879207Z * [new branch] gh/codingwithsurya/12/head -> origin/gh/codingwithsurya/12/head 2025-09-07T06:13:36.8880646Z * [new branch] gh/codingwithsurya/12/orig -> origin/gh/codingwithsurya/12/orig 2025-09-07T06:13:36.8882046Z * [new branch] gh/codingwithsurya/14/base -> origin/gh/codingwithsurya/14/base 2025-09-07T06:13:36.8883189Z * [new branch] gh/codingwithsurya/14/head -> origin/gh/codingwithsurya/14/head 2025-09-07T06:13:36.8884320Z * [new branch] gh/codingwithsurya/14/orig -> origin/gh/codingwithsurya/14/orig 2025-09-07T06:13:36.8886120Z * [new branch] gh/codingwithsurya/15/base -> origin/gh/codingwithsurya/15/base 2025-09-07T06:13:36.8887365Z * [new branch] gh/codingwithsurya/15/head -> origin/gh/codingwithsurya/15/head 2025-09-07T06:13:36.8888489Z * [new branch] gh/codingwithsurya/15/orig -> origin/gh/codingwithsurya/15/orig 2025-09-07T06:13:36.8890309Z * [new branch] gh/codingwithsurya/16/base -> origin/gh/codingwithsurya/16/base 2025-09-07T06:13:36.8891536Z * [new branch] gh/codingwithsurya/16/head -> origin/gh/codingwithsurya/16/head 2025-09-07T06:13:36.8893199Z * [new branch] gh/codingwithsurya/16/orig -> origin/gh/codingwithsurya/16/orig 2025-09-07T06:13:36.8895202Z * [new branch] gh/codingwithsurya/17/base -> origin/gh/codingwithsurya/17/base 2025-09-07T06:13:36.8896692Z * [new branch] gh/codingwithsurya/17/head -> origin/gh/codingwithsurya/17/head 2025-09-07T06:13:36.8897870Z * [new branch] gh/codingwithsurya/17/orig -> origin/gh/codingwithsurya/17/orig 2025-09-07T06:13:36.8899598Z * [new branch] gh/codingwithsurya/18/base -> origin/gh/codingwithsurya/18/base 2025-09-07T06:13:36.8900994Z * [new branch] gh/codingwithsurya/18/head -> origin/gh/codingwithsurya/18/head 2025-09-07T06:13:36.8902224Z * [new branch] gh/codingwithsurya/18/orig -> origin/gh/codingwithsurya/18/orig 2025-09-07T06:13:36.8904056Z * [new branch] gh/codingwithsurya/19/base -> origin/gh/codingwithsurya/19/base 2025-09-07T06:13:36.8905422Z * [new branch] gh/codingwithsurya/19/head -> origin/gh/codingwithsurya/19/head 2025-09-07T06:13:36.8906566Z * [new branch] gh/codingwithsurya/19/orig -> origin/gh/codingwithsurya/19/orig 2025-09-07T06:13:36.8908233Z * [new branch] gh/codingwithsurya/20/base -> origin/gh/codingwithsurya/20/base 2025-09-07T06:13:36.8909381Z * [new branch] gh/codingwithsurya/20/head -> origin/gh/codingwithsurya/20/head 2025-09-07T06:13:36.8910530Z * [new branch] gh/codingwithsurya/20/orig -> origin/gh/codingwithsurya/20/orig 2025-09-07T06:13:36.8912401Z * [new branch] gh/codingwithsurya/21/base -> origin/gh/codingwithsurya/21/base 2025-09-07T06:13:36.8913611Z * [new branch] gh/codingwithsurya/21/head -> origin/gh/codingwithsurya/21/head 2025-09-07T06:13:36.8914768Z * [new branch] gh/codingwithsurya/21/orig -> origin/gh/codingwithsurya/21/orig 2025-09-07T06:13:36.8916674Z * [new branch] gh/colinchan15/1/base -> origin/gh/colinchan15/1/base 2025-09-07T06:13:36.8917818Z * [new branch] gh/colinchan15/1/head -> origin/gh/colinchan15/1/head 2025-09-07T06:13:36.8919214Z * [new branch] gh/colinchan15/2/base -> origin/gh/colinchan15/2/base 2025-09-07T06:13:36.8920310Z * [new branch] gh/colinchan15/2/head -> origin/gh/colinchan15/2/head 2025-09-07T06:13:36.8921745Z * [new branch] gh/colinchan15/3/base -> origin/gh/colinchan15/3/base 2025-09-07T06:13:36.8922810Z * [new branch] gh/colinchan15/3/head -> origin/gh/colinchan15/3/head 2025-09-07T06:13:36.8924201Z * [new branch] gh/colinchan15/6/base -> origin/gh/colinchan15/6/base 2025-09-07T06:13:36.8925318Z * [new branch] gh/colinchan15/6/head -> origin/gh/colinchan15/6/head 2025-09-07T06:13:36.8927341Z * [new branch] gh/davidberard98/382/base -> origin/gh/davidberard98/382/base 2025-09-07T06:13:36.8928775Z * [new branch] gh/davidberard98/382/head -> origin/gh/davidberard98/382/head 2025-09-07T06:13:36.8929831Z * [new branch] gh/davidberard98/382/orig -> origin/gh/davidberard98/382/orig 2025-09-07T06:13:36.8931434Z * [new branch] gh/davidberard98/386/base -> origin/gh/davidberard98/386/base 2025-09-07T06:13:36.8932722Z * [new branch] gh/davidberard98/386/head -> origin/gh/davidberard98/386/head 2025-09-07T06:13:36.8934265Z * [new branch] gh/davidberard98/386/orig -> origin/gh/davidberard98/386/orig 2025-09-07T06:13:36.8935882Z * [new branch] gh/davidberard98/391/base -> origin/gh/davidberard98/391/base 2025-09-07T06:13:36.8936968Z * [new branch] gh/davidberard98/391/head -> origin/gh/davidberard98/391/head 2025-09-07T06:13:36.8938181Z * [new branch] gh/davidberard98/391/orig -> origin/gh/davidberard98/391/orig 2025-09-07T06:13:36.8939774Z * [new branch] gh/davidberard98/392/base -> origin/gh/davidberard98/392/base 2025-09-07T06:13:36.8940932Z * [new branch] gh/davidberard98/392/head -> origin/gh/davidberard98/392/head 2025-09-07T06:13:36.8942115Z * [new branch] gh/davidberard98/392/orig -> origin/gh/davidberard98/392/orig 2025-09-07T06:13:36.8943929Z * [new branch] gh/davidberard98/394/base -> origin/gh/davidberard98/394/base 2025-09-07T06:13:36.8945240Z * [new branch] gh/davidberard98/394/head -> origin/gh/davidberard98/394/head 2025-09-07T06:13:36.8946430Z * [new branch] gh/davidberard98/394/orig -> origin/gh/davidberard98/394/orig 2025-09-07T06:13:36.8948026Z * [new branch] gh/davidberard98/396/base -> origin/gh/davidberard98/396/base 2025-09-07T06:13:36.8949154Z * [new branch] gh/davidberard98/396/head -> origin/gh/davidberard98/396/head 2025-09-07T06:13:36.8950300Z * [new branch] gh/davidberard98/396/orig -> origin/gh/davidberard98/396/orig 2025-09-07T06:13:36.8952120Z * [new branch] gh/davidberard98/397/base -> origin/gh/davidberard98/397/base 2025-09-07T06:13:36.8953299Z * [new branch] gh/davidberard98/397/head -> origin/gh/davidberard98/397/head 2025-09-07T06:13:36.8954527Z * [new branch] gh/davidberard98/397/orig -> origin/gh/davidberard98/397/orig 2025-09-07T06:13:36.8956185Z * [new branch] gh/davidberard98/398/base -> origin/gh/davidberard98/398/base 2025-09-07T06:13:36.8957267Z * [new branch] gh/davidberard98/398/head -> origin/gh/davidberard98/398/head 2025-09-07T06:13:36.8958439Z * [new branch] gh/davidberard98/398/orig -> origin/gh/davidberard98/398/orig 2025-09-07T06:13:36.8960096Z * [new branch] gh/davidberard98/399/base -> origin/gh/davidberard98/399/base 2025-09-07T06:13:36.8961319Z * [new branch] gh/davidberard98/399/head -> origin/gh/davidberard98/399/head 2025-09-07T06:13:36.8962491Z * [new branch] gh/davidberard98/399/orig -> origin/gh/davidberard98/399/orig 2025-09-07T06:13:36.8964178Z * [new branch] gh/davidberard98/400/base -> origin/gh/davidberard98/400/base 2025-09-07T06:13:36.8965436Z * [new branch] gh/davidberard98/400/head -> origin/gh/davidberard98/400/head 2025-09-07T06:13:36.8966552Z * [new branch] gh/davidberard98/400/orig -> origin/gh/davidberard98/400/orig 2025-09-07T06:13:36.8968104Z * [new branch] gh/davidberard98/401/base -> origin/gh/davidberard98/401/base 2025-09-07T06:13:36.8969210Z * [new branch] gh/davidberard98/401/head -> origin/gh/davidberard98/401/head 2025-09-07T06:13:36.8970522Z * [new branch] gh/davidberard98/401/orig -> origin/gh/davidberard98/401/orig 2025-09-07T06:13:36.8972078Z * [new branch] gh/davidberard98/402/base -> origin/gh/davidberard98/402/base 2025-09-07T06:13:36.8973680Z * [new branch] gh/davidberard98/402/head -> origin/gh/davidberard98/402/head 2025-09-07T06:13:36.8974791Z * [new branch] gh/davidberard98/402/orig -> origin/gh/davidberard98/402/orig 2025-09-07T06:13:36.8976423Z * [new branch] gh/davidberard98/403/base -> origin/gh/davidberard98/403/base 2025-09-07T06:13:36.8977601Z * [new branch] gh/davidberard98/403/head -> origin/gh/davidberard98/403/head 2025-09-07T06:13:36.8978803Z * [new branch] gh/davidberard98/403/orig -> origin/gh/davidberard98/403/orig 2025-09-07T06:13:36.8980557Z * [new branch] gh/davidberard98/404/base -> origin/gh/davidberard98/404/base 2025-09-07T06:13:36.8981681Z * [new branch] gh/davidberard98/404/head -> origin/gh/davidberard98/404/head 2025-09-07T06:13:36.8982832Z * [new branch] gh/davidberard98/404/orig -> origin/gh/davidberard98/404/orig 2025-09-07T06:13:36.8984493Z * [new branch] gh/davidberard98/405/base -> origin/gh/davidberard98/405/base 2025-09-07T06:13:36.8985771Z * [new branch] gh/davidberard98/405/head -> origin/gh/davidberard98/405/head 2025-09-07T06:13:36.8986989Z * [new branch] gh/davidberard98/405/orig -> origin/gh/davidberard98/405/orig 2025-09-07T06:13:36.8988727Z * [new branch] gh/davidberard98/406/base -> origin/gh/davidberard98/406/base 2025-09-07T06:13:36.8990013Z * [new branch] gh/davidberard98/406/head -> origin/gh/davidberard98/406/head 2025-09-07T06:13:36.8991300Z * [new branch] gh/davidberard98/406/orig -> origin/gh/davidberard98/406/orig 2025-09-07T06:13:36.8995212Z * [new branch] gh/davidberard98/407/base -> origin/gh/davidberard98/407/base 2025-09-07T06:13:36.8996409Z * [new branch] gh/davidberard98/407/head -> origin/gh/davidberard98/407/head 2025-09-07T06:13:36.8997580Z * [new branch] gh/davidberard98/407/orig -> origin/gh/davidberard98/407/orig 2025-09-07T06:13:36.8999265Z * [new branch] gh/davidberard98/408/base -> origin/gh/davidberard98/408/base 2025-09-07T06:13:36.9000470Z * [new branch] gh/davidberard98/408/head -> origin/gh/davidberard98/408/head 2025-09-07T06:13:36.9001637Z * [new branch] gh/davidberard98/408/orig -> origin/gh/davidberard98/408/orig 2025-09-07T06:13:36.9003136Z * [new branch] gh/davidberard98/409/base -> origin/gh/davidberard98/409/base 2025-09-07T06:13:36.9004444Z * [new branch] gh/davidberard98/409/head -> origin/gh/davidberard98/409/head 2025-09-07T06:13:36.9005865Z * [new branch] gh/davidberard98/409/orig -> origin/gh/davidberard98/409/orig 2025-09-07T06:13:36.9007702Z * [new branch] gh/desertfire/594/base -> origin/gh/desertfire/594/base 2025-09-07T06:13:36.9008832Z * [new branch] gh/desertfire/594/head -> origin/gh/desertfire/594/head 2025-09-07T06:13:36.9010071Z * [new branch] gh/desertfire/594/orig -> origin/gh/desertfire/594/orig 2025-09-07T06:13:36.9011571Z * [new branch] gh/desertfire/595/base -> origin/gh/desertfire/595/base 2025-09-07T06:13:36.9012764Z * [new branch] gh/desertfire/595/head -> origin/gh/desertfire/595/head 2025-09-07T06:13:36.9014283Z * [new branch] gh/desertfire/595/orig -> origin/gh/desertfire/595/orig 2025-09-07T06:13:36.9015778Z * [new branch] gh/desertfire/597/base -> origin/gh/desertfire/597/base 2025-09-07T06:13:36.9016967Z * [new branch] gh/desertfire/597/head -> origin/gh/desertfire/597/head 2025-09-07T06:13:36.9018168Z * [new branch] gh/desertfire/597/orig -> origin/gh/desertfire/597/orig 2025-09-07T06:13:36.9020044Z * [new branch] gh/dharakk/1/base -> origin/gh/dharakk/1/base 2025-09-07T06:13:36.9021263Z * [new branch] gh/dharakk/1/head -> origin/gh/dharakk/1/head 2025-09-07T06:13:36.9023266Z * [new branch] gh/drisspg/149/base -> origin/gh/drisspg/149/base 2025-09-07T06:13:36.9024339Z * [new branch] gh/drisspg/149/head -> origin/gh/drisspg/149/head 2025-09-07T06:13:36.9025611Z * [new branch] gh/drisspg/149/orig -> origin/gh/drisspg/149/orig 2025-09-07T06:13:36.9027145Z * [new branch] gh/drisspg/159/base -> origin/gh/drisspg/159/base 2025-09-07T06:13:36.9028265Z * [new branch] gh/drisspg/159/head -> origin/gh/drisspg/159/head 2025-09-07T06:13:36.9029440Z * [new branch] gh/drisspg/159/orig -> origin/gh/drisspg/159/orig 2025-09-07T06:13:36.9030950Z * [new branch] gh/drisspg/166/base -> origin/gh/drisspg/166/base 2025-09-07T06:13:36.9032084Z * [new branch] gh/drisspg/166/head -> origin/gh/drisspg/166/head 2025-09-07T06:13:36.9033233Z * [new branch] gh/drisspg/166/orig -> origin/gh/drisspg/166/orig 2025-09-07T06:13:36.9034755Z * [new branch] gh/drisspg/170/base -> origin/gh/drisspg/170/base 2025-09-07T06:13:36.9035861Z * [new branch] gh/drisspg/170/head -> origin/gh/drisspg/170/head 2025-09-07T06:13:36.9037021Z * [new branch] gh/drisspg/170/orig -> origin/gh/drisspg/170/orig 2025-09-07T06:13:36.9038522Z * [new branch] gh/drisspg/173/base -> origin/gh/drisspg/173/base 2025-09-07T06:13:36.9039656Z * [new branch] gh/drisspg/173/head -> origin/gh/drisspg/173/head 2025-09-07T06:13:36.9040804Z * [new branch] gh/drisspg/173/orig -> origin/gh/drisspg/173/orig 2025-09-07T06:13:36.9042532Z * [new branch] gh/drisspg/177/base -> origin/gh/drisspg/177/base 2025-09-07T06:13:36.9043685Z * [new branch] gh/drisspg/177/head -> origin/gh/drisspg/177/head 2025-09-07T06:13:36.9044830Z * [new branch] gh/drisspg/177/orig -> origin/gh/drisspg/177/orig 2025-09-07T06:13:36.9046359Z * [new branch] gh/drisspg/178/base -> origin/gh/drisspg/178/base 2025-09-07T06:13:36.9047465Z * [new branch] gh/drisspg/178/head -> origin/gh/drisspg/178/head 2025-09-07T06:13:36.9048456Z * [new branch] gh/drisspg/178/orig -> origin/gh/drisspg/178/orig 2025-09-07T06:13:36.9050086Z * [new branch] gh/drisspg/180/base -> origin/gh/drisspg/180/base 2025-09-07T06:13:36.9051227Z * [new branch] gh/drisspg/180/head -> origin/gh/drisspg/180/head 2025-09-07T06:13:36.9052353Z * [new branch] gh/drisspg/180/orig -> origin/gh/drisspg/180/orig 2025-09-07T06:13:36.9054232Z * [new branch] gh/drisspg/181/base -> origin/gh/drisspg/181/base 2025-09-07T06:13:36.9055431Z * [new branch] gh/drisspg/181/head -> origin/gh/drisspg/181/head 2025-09-07T06:13:36.9056614Z * [new branch] gh/drisspg/181/orig -> origin/gh/drisspg/181/orig 2025-09-07T06:13:36.9058343Z * [new branch] gh/drisspg/182/base -> origin/gh/drisspg/182/base 2025-09-07T06:13:36.9059379Z * [new branch] gh/drisspg/182/head -> origin/gh/drisspg/182/head 2025-09-07T06:13:36.9060785Z * [new branch] gh/drisspg/183/base -> origin/gh/drisspg/183/base 2025-09-07T06:13:36.9061871Z * [new branch] gh/drisspg/183/head -> origin/gh/drisspg/183/head 2025-09-07T06:13:36.9063299Z * [new branch] gh/drisspg/184/base -> origin/gh/drisspg/184/base 2025-09-07T06:13:36.9064372Z * [new branch] gh/drisspg/184/head -> origin/gh/drisspg/184/head 2025-09-07T06:13:36.9066097Z * [new branch] gh/drisspg/185/base -> origin/gh/drisspg/185/base 2025-09-07T06:13:36.9067285Z * [new branch] gh/drisspg/185/head -> origin/gh/drisspg/185/head 2025-09-07T06:13:36.9068779Z * [new branch] gh/drisspg/186/base -> origin/gh/drisspg/186/base 2025-09-07T06:13:36.9070016Z * [new branch] gh/drisspg/186/head -> origin/gh/drisspg/186/head 2025-09-07T06:13:36.9071057Z * [new branch] gh/drisspg/186/orig -> origin/gh/drisspg/186/orig 2025-09-07T06:13:36.9072571Z * [new branch] gh/drisspg/187/base -> origin/gh/drisspg/187/base 2025-09-07T06:13:36.9073698Z * [new branch] gh/drisspg/187/head -> origin/gh/drisspg/187/head 2025-09-07T06:13:36.9074842Z * [new branch] gh/drisspg/187/orig -> origin/gh/drisspg/187/orig 2025-09-07T06:13:36.9076357Z * [new branch] gh/drisspg/188/base -> origin/gh/drisspg/188/base 2025-09-07T06:13:36.9077501Z * [new branch] gh/drisspg/188/head -> origin/gh/drisspg/188/head 2025-09-07T06:13:36.9078602Z * [new branch] gh/drisspg/188/orig -> origin/gh/drisspg/188/orig 2025-09-07T06:13:36.9080576Z * [new branch] gh/drisspg/189/base -> origin/gh/drisspg/189/base 2025-09-07T06:13:36.9081731Z * [new branch] gh/drisspg/189/head -> origin/gh/drisspg/189/head 2025-09-07T06:13:36.9082912Z * [new branch] gh/drisspg/189/orig -> origin/gh/drisspg/189/orig 2025-09-07T06:13:36.9084550Z * [new branch] gh/drisspg/190/base -> origin/gh/drisspg/190/base 2025-09-07T06:13:36.9085698Z * [new branch] gh/drisspg/190/head -> origin/gh/drisspg/190/head 2025-09-07T06:13:36.9086825Z * [new branch] gh/drisspg/190/orig -> origin/gh/drisspg/190/orig 2025-09-07T06:13:36.9088441Z * [new branch] gh/drisspg/191/base -> origin/gh/drisspg/191/base 2025-09-07T06:13:36.9089565Z * [new branch] gh/drisspg/191/head -> origin/gh/drisspg/191/head 2025-09-07T06:13:36.9090689Z * [new branch] gh/drisspg/191/orig -> origin/gh/drisspg/191/orig 2025-09-07T06:13:36.9092429Z * [new branch] gh/drisspg/192/base -> origin/gh/drisspg/192/base 2025-09-07T06:13:36.9094048Z * [new branch] gh/drisspg/192/head -> origin/gh/drisspg/192/head 2025-09-07T06:13:36.9095167Z * [new branch] gh/drisspg/192/orig -> origin/gh/drisspg/192/orig 2025-09-07T06:13:36.9096805Z * [new branch] gh/drisspg/193/base -> origin/gh/drisspg/193/base 2025-09-07T06:13:36.9097988Z * [new branch] gh/drisspg/193/head -> origin/gh/drisspg/193/head 2025-09-07T06:13:36.9099242Z * [new branch] gh/drisspg/193/orig -> origin/gh/drisspg/193/orig 2025-09-07T06:13:36.9100789Z * [new branch] gh/drisspg/194/base -> origin/gh/drisspg/194/base 2025-09-07T06:13:36.9101947Z * [new branch] gh/drisspg/194/head -> origin/gh/drisspg/194/head 2025-09-07T06:13:36.9103119Z * [new branch] gh/drisspg/194/orig -> origin/gh/drisspg/194/orig 2025-09-07T06:13:36.9104903Z * [new branch] gh/drisspg/195/base -> origin/gh/drisspg/195/base 2025-09-07T06:13:36.9106042Z * [new branch] gh/drisspg/195/head -> origin/gh/drisspg/195/head 2025-09-07T06:13:36.9107227Z * [new branch] gh/drisspg/195/orig -> origin/gh/drisspg/195/orig 2025-09-07T06:13:36.9108972Z * [new branch] gh/drisspg/196/base -> origin/gh/drisspg/196/base 2025-09-07T06:13:36.9110111Z * [new branch] gh/drisspg/196/head -> origin/gh/drisspg/196/head 2025-09-07T06:13:36.9111268Z * [new branch] gh/drisspg/196/orig -> origin/gh/drisspg/196/orig 2025-09-07T06:13:36.9112812Z * [new branch] gh/drisspg/197/base -> origin/gh/drisspg/197/base 2025-09-07T06:13:36.9113938Z * [new branch] gh/drisspg/197/head -> origin/gh/drisspg/197/head 2025-09-07T06:13:36.9115126Z * [new branch] gh/drisspg/197/orig -> origin/gh/drisspg/197/orig 2025-09-07T06:13:36.9116789Z * [new branch] gh/drisspg/198/base -> origin/gh/drisspg/198/base 2025-09-07T06:13:36.9117818Z * [new branch] gh/drisspg/198/head -> origin/gh/drisspg/198/head 2025-09-07T06:13:36.9118943Z * [new branch] gh/drisspg/198/orig -> origin/gh/drisspg/198/orig 2025-09-07T06:13:36.9120470Z * [new branch] gh/drisspg/199/base -> origin/gh/drisspg/199/base 2025-09-07T06:13:36.9121611Z * [new branch] gh/drisspg/199/head -> origin/gh/drisspg/199/head 2025-09-07T06:13:36.9122730Z * [new branch] gh/drisspg/199/orig -> origin/gh/drisspg/199/orig 2025-09-07T06:13:36.9124676Z * [new branch] gh/dsjohns2/1/base -> origin/gh/dsjohns2/1/base 2025-09-07T06:13:36.9125820Z * [new branch] gh/dsjohns2/1/head -> origin/gh/dsjohns2/1/head 2025-09-07T06:13:36.9127643Z * [new branch] gh/eellison/784/base -> origin/gh/eellison/784/base 2025-09-07T06:13:36.9128814Z * [new branch] gh/eellison/784/head -> origin/gh/eellison/784/head 2025-09-07T06:13:36.9129998Z * [new branch] gh/eellison/784/orig -> origin/gh/eellison/784/orig 2025-09-07T06:13:36.9131955Z * [new branch] gh/eellison/785/base -> origin/gh/eellison/785/base 2025-09-07T06:13:36.9133185Z * [new branch] gh/eellison/785/head -> origin/gh/eellison/785/head 2025-09-07T06:13:36.9134501Z * [new branch] gh/eellison/785/orig -> origin/gh/eellison/785/orig 2025-09-07T06:13:36.9136104Z * [new branch] gh/eellison/789/base -> origin/gh/eellison/789/base 2025-09-07T06:13:36.9137271Z * [new branch] gh/eellison/789/head -> origin/gh/eellison/789/head 2025-09-07T06:13:36.9138431Z * [new branch] gh/eellison/789/orig -> origin/gh/eellison/789/orig 2025-09-07T06:13:36.9139961Z * [new branch] gh/eellison/800/base -> origin/gh/eellison/800/base 2025-09-07T06:13:36.9141138Z * [new branch] gh/eellison/800/head -> origin/gh/eellison/800/head 2025-09-07T06:13:36.9142310Z * [new branch] gh/eellison/800/orig -> origin/gh/eellison/800/orig 2025-09-07T06:13:36.9143893Z * [new branch] gh/eellison/801/base -> origin/gh/eellison/801/base 2025-09-07T06:13:36.9145173Z * [new branch] gh/eellison/801/head -> origin/gh/eellison/801/head 2025-09-07T06:13:36.9146428Z * [new branch] gh/eellison/801/orig -> origin/gh/eellison/801/orig 2025-09-07T06:13:36.9148044Z * [new branch] gh/eellison/802/base -> origin/gh/eellison/802/base 2025-09-07T06:13:36.9149181Z * [new branch] gh/eellison/802/head -> origin/gh/eellison/802/head 2025-09-07T06:13:36.9150308Z * [new branch] gh/eellison/802/orig -> origin/gh/eellison/802/orig 2025-09-07T06:13:36.9151812Z * [new branch] gh/eellison/805/base -> origin/gh/eellison/805/base 2025-09-07T06:13:36.9152969Z * [new branch] gh/eellison/805/head -> origin/gh/eellison/805/head 2025-09-07T06:13:36.9154126Z * [new branch] gh/eellison/805/orig -> origin/gh/eellison/805/orig 2025-09-07T06:13:36.9155770Z * [new branch] gh/eellison/808/base -> origin/gh/eellison/808/base 2025-09-07T06:13:36.9156954Z * [new branch] gh/eellison/808/head -> origin/gh/eellison/808/head 2025-09-07T06:13:36.9158104Z * [new branch] gh/eellison/808/orig -> origin/gh/eellison/808/orig 2025-09-07T06:13:36.9159636Z * [new branch] gh/eellison/809/base -> origin/gh/eellison/809/base 2025-09-07T06:13:36.9160784Z * [new branch] gh/eellison/809/head -> origin/gh/eellison/809/head 2025-09-07T06:13:36.9161911Z * [new branch] gh/eellison/809/orig -> origin/gh/eellison/809/orig 2025-09-07T06:13:36.9163493Z * [new branch] gh/eellison/813/base -> origin/gh/eellison/813/base 2025-09-07T06:13:36.9164572Z * [new branch] gh/eellison/813/head -> origin/gh/eellison/813/head 2025-09-07T06:13:36.9165754Z * [new branch] gh/eellison/813/orig -> origin/gh/eellison/813/orig 2025-09-07T06:13:36.9167280Z * [new branch] gh/eellison/814/base -> origin/gh/eellison/814/base 2025-09-07T06:13:36.9168468Z * [new branch] gh/eellison/814/head -> origin/gh/eellison/814/head 2025-09-07T06:13:36.9169620Z * [new branch] gh/eellison/814/orig -> origin/gh/eellison/814/orig 2025-09-07T06:13:36.9171796Z * [new branch] gh/eellison/815/base -> origin/gh/eellison/815/base 2025-09-07T06:13:36.9172737Z * [new branch] gh/eellison/815/head -> origin/gh/eellison/815/head 2025-09-07T06:13:36.9174295Z * [new branch] gh/eellison/815/orig -> origin/gh/eellison/815/orig 2025-09-07T06:13:36.9175878Z * [new branch] gh/eellison/816/base -> origin/gh/eellison/816/base 2025-09-07T06:13:36.9177162Z * [new branch] gh/eellison/816/head -> origin/gh/eellison/816/head 2025-09-07T06:13:36.9178422Z * [new branch] gh/eellison/816/orig -> origin/gh/eellison/816/orig 2025-09-07T06:13:36.9179990Z * [new branch] gh/eellison/817/base -> origin/gh/eellison/817/base 2025-09-07T06:13:36.9181104Z * [new branch] gh/eellison/817/head -> origin/gh/eellison/817/head 2025-09-07T06:13:36.9182186Z * [new branch] gh/eellison/817/orig -> origin/gh/eellison/817/orig 2025-09-07T06:13:36.9183926Z * [new branch] gh/eellison/818/base -> origin/gh/eellison/818/base 2025-09-07T06:13:36.9185145Z * [new branch] gh/eellison/818/head -> origin/gh/eellison/818/head 2025-09-07T06:13:36.9186268Z * [new branch] gh/eellison/818/orig -> origin/gh/eellison/818/orig 2025-09-07T06:13:36.9188062Z * [new branch] gh/eellison/819/base -> origin/gh/eellison/819/base 2025-09-07T06:13:36.9189129Z * [new branch] gh/eellison/819/head -> origin/gh/eellison/819/head 2025-09-07T06:13:36.9190287Z * [new branch] gh/eellison/819/orig -> origin/gh/eellison/819/orig 2025-09-07T06:13:36.9192990Z * [new branch] gh/eellison/820/base -> origin/gh/eellison/820/base 2025-09-07T06:13:36.9194384Z * [new branch] gh/eellison/820/head -> origin/gh/eellison/820/head 2025-09-07T06:13:36.9195603Z * [new branch] gh/eellison/820/orig -> origin/gh/eellison/820/orig 2025-09-07T06:13:36.9197109Z * [new branch] gh/eellison/821/base -> origin/gh/eellison/821/base 2025-09-07T06:13:36.9198284Z * [new branch] gh/eellison/821/head -> origin/gh/eellison/821/head 2025-09-07T06:13:36.9199535Z * [new branch] gh/eellison/821/orig -> origin/gh/eellison/821/orig 2025-09-07T06:13:36.9201177Z * [new branch] gh/eellison/822/base -> origin/gh/eellison/822/base 2025-09-07T06:13:36.9202337Z * [new branch] gh/eellison/822/head -> origin/gh/eellison/822/head 2025-09-07T06:13:36.9203519Z * [new branch] gh/eellison/822/orig -> origin/gh/eellison/822/orig 2025-09-07T06:13:36.9205280Z * [new branch] gh/eellison/823/base -> origin/gh/eellison/823/base 2025-09-07T06:13:36.9206445Z * [new branch] gh/eellison/823/head -> origin/gh/eellison/823/head 2025-09-07T06:13:36.9207570Z * [new branch] gh/eellison/823/orig -> origin/gh/eellison/823/orig 2025-09-07T06:13:36.9209437Z * [new branch] gh/etaf/132/base -> origin/gh/etaf/132/base 2025-09-07T06:13:36.9210590Z * [new branch] gh/etaf/132/head -> origin/gh/etaf/132/head 2025-09-07T06:13:36.9211865Z * [new branch] gh/etaf/132/orig -> origin/gh/etaf/132/orig 2025-09-07T06:13:36.9213634Z * [new branch] gh/etaf/138/base -> origin/gh/etaf/138/base 2025-09-07T06:13:36.9214803Z * [new branch] gh/etaf/138/head -> origin/gh/etaf/138/head 2025-09-07T06:13:36.9215973Z * [new branch] gh/etaf/138/orig -> origin/gh/etaf/138/orig 2025-09-07T06:13:36.9217561Z * [new branch] gh/etaf/140/base -> origin/gh/etaf/140/base 2025-09-07T06:13:36.9218728Z * [new branch] gh/etaf/140/head -> origin/gh/etaf/140/head 2025-09-07T06:13:36.9219904Z * [new branch] gh/etaf/140/orig -> origin/gh/etaf/140/orig 2025-09-07T06:13:36.9221465Z * [new branch] gh/etaf/143/base -> origin/gh/etaf/143/base 2025-09-07T06:13:36.9222622Z * [new branch] gh/etaf/143/head -> origin/gh/etaf/143/head 2025-09-07T06:13:36.9223795Z * [new branch] gh/etaf/143/orig -> origin/gh/etaf/143/orig 2025-09-07T06:13:36.9225586Z * [new branch] gh/etaf/147/base -> origin/gh/etaf/147/base 2025-09-07T06:13:36.9226774Z * [new branch] gh/etaf/147/head -> origin/gh/etaf/147/head 2025-09-07T06:13:36.9228438Z * [new branch] gh/etaf/151/base -> origin/gh/etaf/151/base 2025-09-07T06:13:36.9229708Z * [new branch] gh/etaf/151/head -> origin/gh/etaf/151/head 2025-09-07T06:13:36.9230901Z * [new branch] gh/etaf/151/orig -> origin/gh/etaf/151/orig 2025-09-07T06:13:36.9232610Z * [new branch] gh/etaf/152/base -> origin/gh/etaf/152/base 2025-09-07T06:13:36.9233820Z * [new branch] gh/etaf/152/head -> origin/gh/etaf/152/head 2025-09-07T06:13:36.9234977Z * [new branch] gh/etaf/152/orig -> origin/gh/etaf/152/orig 2025-09-07T06:13:36.9236664Z * [new branch] gh/etaf/153/base -> origin/gh/etaf/153/base 2025-09-07T06:13:36.9237848Z * [new branch] gh/etaf/153/head -> origin/gh/etaf/153/head 2025-09-07T06:13:36.9238975Z * [new branch] gh/etaf/153/orig -> origin/gh/etaf/153/orig 2025-09-07T06:13:36.9240738Z * [new branch] gh/etaf/154/base -> origin/gh/etaf/154/base 2025-09-07T06:13:36.9241931Z * [new branch] gh/etaf/154/head -> origin/gh/etaf/154/head 2025-09-07T06:13:36.9242999Z * [new branch] gh/etaf/154/orig -> origin/gh/etaf/154/orig 2025-09-07T06:13:36.9244654Z * [new branch] gh/etaf/155/base -> origin/gh/etaf/155/base 2025-09-07T06:13:36.9245855Z * [new branch] gh/etaf/155/head -> origin/gh/etaf/155/head 2025-09-07T06:13:36.9247126Z * [new branch] gh/etaf/155/orig -> origin/gh/etaf/155/orig 2025-09-07T06:13:36.9248593Z * [new branch] gh/etaf/156/base -> origin/gh/etaf/156/base 2025-09-07T06:13:36.9249781Z * [new branch] gh/etaf/156/head -> origin/gh/etaf/156/head 2025-09-07T06:13:36.9250974Z * [new branch] gh/etaf/156/orig -> origin/gh/etaf/156/orig 2025-09-07T06:13:36.9252784Z * [new branch] gh/etaf/157/base -> origin/gh/etaf/157/base 2025-09-07T06:13:36.9254302Z * [new branch] gh/etaf/157/head -> origin/gh/etaf/157/head 2025-09-07T06:13:36.9255476Z * [new branch] gh/etaf/157/orig -> origin/gh/etaf/157/orig 2025-09-07T06:13:36.9257050Z * [new branch] gh/etaf/158/base -> origin/gh/etaf/158/base 2025-09-07T06:13:36.9258239Z * [new branch] gh/etaf/158/head -> origin/gh/etaf/158/head 2025-09-07T06:13:36.9259434Z * [new branch] gh/etaf/158/orig -> origin/gh/etaf/158/orig 2025-09-07T06:13:36.9261215Z * [new branch] gh/etaf/159/base -> origin/gh/etaf/159/base 2025-09-07T06:13:36.9262335Z * [new branch] gh/etaf/159/head -> origin/gh/etaf/159/head 2025-09-07T06:13:36.9263493Z * [new branch] gh/etaf/159/orig -> origin/gh/etaf/159/orig 2025-09-07T06:13:36.9265229Z * [new branch] gh/etaf/160/base -> origin/gh/etaf/160/base 2025-09-07T06:13:36.9266567Z * [new branch] gh/etaf/160/head -> origin/gh/etaf/160/head 2025-09-07T06:13:36.9267735Z * [new branch] gh/etaf/160/orig -> origin/gh/etaf/160/orig 2025-09-07T06:13:36.9269450Z * [new branch] gh/etaf/161/base -> origin/gh/etaf/161/base 2025-09-07T06:13:36.9270609Z * [new branch] gh/etaf/161/head -> origin/gh/etaf/161/head 2025-09-07T06:13:36.9271745Z * [new branch] gh/etaf/161/orig -> origin/gh/etaf/161/orig 2025-09-07T06:13:36.9273725Z * [new branch] gh/etaf/162/base -> origin/gh/etaf/162/base 2025-09-07T06:13:36.9274869Z * [new branch] gh/etaf/162/head -> origin/gh/etaf/162/head 2025-09-07T06:13:36.9276013Z * [new branch] gh/etaf/162/orig -> origin/gh/etaf/162/orig 2025-09-07T06:13:36.9277609Z * [new branch] gh/etaf/163/base -> origin/gh/etaf/163/base 2025-09-07T06:13:36.9278753Z * [new branch] gh/etaf/163/head -> origin/gh/etaf/163/head 2025-09-07T06:13:36.9279932Z * [new branch] gh/etaf/163/orig -> origin/gh/etaf/163/orig 2025-09-07T06:13:36.9281583Z * [new branch] gh/etaf/164/base -> origin/gh/etaf/164/base 2025-09-07T06:13:36.9282747Z * [new branch] gh/etaf/164/head -> origin/gh/etaf/164/head 2025-09-07T06:13:36.9283924Z * [new branch] gh/etaf/164/orig -> origin/gh/etaf/164/orig 2025-09-07T06:13:36.9285521Z * [new branch] gh/etaf/165/base -> origin/gh/etaf/165/base 2025-09-07T06:13:36.9286640Z * [new branch] gh/etaf/165/orig -> origin/gh/etaf/165/orig 2025-09-07T06:13:36.9288264Z * [new branch] gh/etaf/166/base -> origin/gh/etaf/166/base 2025-09-07T06:13:36.9289455Z * [new branch] gh/etaf/166/head -> origin/gh/etaf/166/head 2025-09-07T06:13:36.9290602Z * [new branch] gh/etaf/166/orig -> origin/gh/etaf/166/orig 2025-09-07T06:13:36.9292527Z * [new branch] gh/etaf/167/base -> origin/gh/etaf/167/base 2025-09-07T06:13:36.9294189Z * [new branch] gh/etaf/167/head -> origin/gh/etaf/167/head 2025-09-07T06:13:36.9295316Z * [new branch] gh/etaf/167/orig -> origin/gh/etaf/167/orig 2025-09-07T06:13:36.9297058Z * [new branch] gh/etaf/168/base -> origin/gh/etaf/168/base 2025-09-07T06:13:36.9298272Z * [new branch] gh/etaf/168/head -> origin/gh/etaf/168/head 2025-09-07T06:13:36.9299524Z * [new branch] gh/etaf/168/orig -> origin/gh/etaf/168/orig 2025-09-07T06:13:36.9301239Z * [new branch] gh/etaf/169/base -> origin/gh/etaf/169/base 2025-09-07T06:13:36.9302409Z * [new branch] gh/etaf/169/head -> origin/gh/etaf/169/head 2025-09-07T06:13:36.9303594Z * [new branch] gh/etaf/169/orig -> origin/gh/etaf/169/orig 2025-09-07T06:13:36.9305624Z * [new branch] gh/exclamaforte/1/base -> origin/gh/exclamaforte/1/base 2025-09-07T06:13:36.9306960Z * [new branch] gh/exclamaforte/1/head -> origin/gh/exclamaforte/1/head 2025-09-07T06:13:36.9308394Z * [new branch] gh/exclamaforte/2/base -> origin/gh/exclamaforte/2/base 2025-09-07T06:13:36.9309377Z * [new branch] gh/exclamaforte/2/head -> origin/gh/exclamaforte/2/head 2025-09-07T06:13:36.9311023Z * [new branch] gh/exclamaforte/3/base -> origin/gh/exclamaforte/3/base 2025-09-07T06:13:36.9312029Z * [new branch] gh/exclamaforte/3/head -> origin/gh/exclamaforte/3/head 2025-09-07T06:13:36.9313528Z * [new branch] gh/exclamaforte/4/base -> origin/gh/exclamaforte/4/base 2025-09-07T06:13:36.9314767Z * [new branch] gh/exclamaforte/4/head -> origin/gh/exclamaforte/4/head 2025-09-07T06:13:36.9316622Z * [new branch] gh/ezyang/2374/base -> origin/gh/ezyang/2374/base 2025-09-07T06:13:36.9317990Z * [new branch] gh/ezyang/2374/head -> origin/gh/ezyang/2374/head 2025-09-07T06:13:36.9319153Z * [new branch] gh/ezyang/2374/orig -> origin/gh/ezyang/2374/orig 2025-09-07T06:13:36.9320735Z * [new branch] gh/ezyang/2973/base -> origin/gh/ezyang/2973/base 2025-09-07T06:13:36.9321850Z * [new branch] gh/ezyang/2973/head -> origin/gh/ezyang/2973/head 2025-09-07T06:13:36.9323039Z * [new branch] gh/ezyang/2973/orig -> origin/gh/ezyang/2973/orig 2025-09-07T06:13:36.9324538Z * [new branch] gh/ezyang/2974/base -> origin/gh/ezyang/2974/base 2025-09-07T06:13:36.9325655Z * [new branch] gh/ezyang/2974/head -> origin/gh/ezyang/2974/head 2025-09-07T06:13:36.9326899Z * [new branch] gh/ezyang/2974/orig -> origin/gh/ezyang/2974/orig 2025-09-07T06:13:36.9328462Z * [new branch] gh/ezyang/3074/base -> origin/gh/ezyang/3074/base 2025-09-07T06:13:36.9329588Z * [new branch] gh/ezyang/3074/head -> origin/gh/ezyang/3074/head 2025-09-07T06:13:36.9330725Z * [new branch] gh/ezyang/3074/orig -> origin/gh/ezyang/3074/orig 2025-09-07T06:13:36.9332263Z * [new branch] gh/ezyang/3088/base -> origin/gh/ezyang/3088/base 2025-09-07T06:13:36.9333747Z * [new branch] gh/ezyang/3088/head -> origin/gh/ezyang/3088/head 2025-09-07T06:13:36.9334960Z * [new branch] gh/ezyang/3088/orig -> origin/gh/ezyang/3088/orig 2025-09-07T06:13:36.9336594Z * [new branch] gh/ezyang/3092/base -> origin/gh/ezyang/3092/base 2025-09-07T06:13:36.9337866Z * [new branch] gh/ezyang/3092/head -> origin/gh/ezyang/3092/head 2025-09-07T06:13:36.9339059Z * [new branch] gh/ezyang/3092/orig -> origin/gh/ezyang/3092/orig 2025-09-07T06:13:36.9340629Z * [new branch] gh/ezyang/3103/base -> origin/gh/ezyang/3103/base 2025-09-07T06:13:36.9341771Z * [new branch] gh/ezyang/3103/head -> origin/gh/ezyang/3103/head 2025-09-07T06:13:36.9342965Z * [new branch] gh/ezyang/3103/orig -> origin/gh/ezyang/3103/orig 2025-09-07T06:13:36.9344542Z * [new branch] gh/ezyang/3105/base -> origin/gh/ezyang/3105/base 2025-09-07T06:13:36.9345845Z * [new branch] gh/ezyang/3105/head -> origin/gh/ezyang/3105/head 2025-09-07T06:13:36.9346956Z * [new branch] gh/ezyang/3105/orig -> origin/gh/ezyang/3105/orig 2025-09-07T06:13:36.9348483Z * [new branch] gh/ezyang/3114/base -> origin/gh/ezyang/3114/base 2025-09-07T06:13:36.9349690Z * [new branch] gh/ezyang/3114/head -> origin/gh/ezyang/3114/head 2025-09-07T06:13:36.9350839Z * [new branch] gh/ezyang/3114/orig -> origin/gh/ezyang/3114/orig 2025-09-07T06:13:36.9352371Z * [new branch] gh/ezyang/3116/base -> origin/gh/ezyang/3116/base 2025-09-07T06:13:36.9353480Z * [new branch] gh/ezyang/3116/head -> origin/gh/ezyang/3116/head 2025-09-07T06:13:36.9354627Z * [new branch] gh/ezyang/3116/orig -> origin/gh/ezyang/3116/orig 2025-09-07T06:13:36.9356126Z * [new branch] gh/ezyang/3120/base -> origin/gh/ezyang/3120/base 2025-09-07T06:13:36.9357674Z * [new branch] gh/ezyang/3120/head -> origin/gh/ezyang/3120/head 2025-09-07T06:13:36.9358454Z * [new branch] gh/ezyang/3120/orig -> origin/gh/ezyang/3120/orig 2025-09-07T06:13:36.9359940Z * [new branch] gh/ezyang/3122/base -> origin/gh/ezyang/3122/base 2025-09-07T06:13:36.9361073Z * [new branch] gh/ezyang/3122/head -> origin/gh/ezyang/3122/head 2025-09-07T06:13:36.9362247Z * [new branch] gh/ezyang/3122/orig -> origin/gh/ezyang/3122/orig 2025-09-07T06:13:36.9363728Z * [new branch] gh/ezyang/3123/base -> origin/gh/ezyang/3123/base 2025-09-07T06:13:36.9364853Z * [new branch] gh/ezyang/3123/head -> origin/gh/ezyang/3123/head 2025-09-07T06:13:36.9365987Z * [new branch] gh/ezyang/3123/orig -> origin/gh/ezyang/3123/orig 2025-09-07T06:13:36.9367536Z * [new branch] gh/ezyang/3125/base -> origin/gh/ezyang/3125/base 2025-09-07T06:13:36.9368640Z * [new branch] gh/ezyang/3125/head -> origin/gh/ezyang/3125/head 2025-09-07T06:13:36.9369799Z * [new branch] gh/ezyang/3125/orig -> origin/gh/ezyang/3125/orig 2025-09-07T06:13:36.9371356Z * [new branch] gh/ezyang/3126/base -> origin/gh/ezyang/3126/base 2025-09-07T06:13:36.9372434Z * [new branch] gh/ezyang/3126/head -> origin/gh/ezyang/3126/head 2025-09-07T06:13:36.9373946Z * [new branch] gh/ezyang/3126/orig -> origin/gh/ezyang/3126/orig 2025-09-07T06:13:36.9376049Z * [new branch] gh/ezyang/3127/base -> origin/gh/ezyang/3127/base 2025-09-07T06:13:36.9377305Z * [new branch] gh/ezyang/3127/head -> origin/gh/ezyang/3127/head 2025-09-07T06:13:36.9378449Z * [new branch] gh/ezyang/3127/orig -> origin/gh/ezyang/3127/orig 2025-09-07T06:13:36.9380135Z * [new branch] gh/ezyang/3128/base -> origin/gh/ezyang/3128/base 2025-09-07T06:13:36.9381335Z * [new branch] gh/ezyang/3128/head -> origin/gh/ezyang/3128/head 2025-09-07T06:13:36.9382487Z * [new branch] gh/ezyang/3128/orig -> origin/gh/ezyang/3128/orig 2025-09-07T06:13:36.9384118Z * [new branch] gh/ezyang/3129/base -> origin/gh/ezyang/3129/base 2025-09-07T06:13:36.9385523Z * [new branch] gh/ezyang/3129/head -> origin/gh/ezyang/3129/head 2025-09-07T06:13:36.9386698Z * [new branch] gh/ezyang/3129/orig -> origin/gh/ezyang/3129/orig 2025-09-07T06:13:36.9388239Z * [new branch] gh/ezyang/3130/base -> origin/gh/ezyang/3130/base 2025-09-07T06:13:36.9389371Z * [new branch] gh/ezyang/3130/head -> origin/gh/ezyang/3130/head 2025-09-07T06:13:36.9390506Z * [new branch] gh/ezyang/3130/orig -> origin/gh/ezyang/3130/orig 2025-09-07T06:13:36.9392160Z * [new branch] gh/ezyang/3131/base -> origin/gh/ezyang/3131/base 2025-09-07T06:13:36.9395403Z * [new branch] gh/ezyang/3131/head -> origin/gh/ezyang/3131/head 2025-09-07T06:13:36.9396627Z * [new branch] gh/ezyang/3131/orig -> origin/gh/ezyang/3131/orig 2025-09-07T06:13:36.9398409Z * [new branch] gh/ezyang/3132/base -> origin/gh/ezyang/3132/base 2025-09-07T06:13:36.9399586Z * [new branch] gh/ezyang/3132/head -> origin/gh/ezyang/3132/head 2025-09-07T06:13:36.9400773Z * [new branch] gh/ezyang/3132/orig -> origin/gh/ezyang/3132/orig 2025-09-07T06:13:36.9402354Z * [new branch] gh/ezyang/3133/base -> origin/gh/ezyang/3133/base 2025-09-07T06:13:36.9403478Z * [new branch] gh/ezyang/3133/head -> origin/gh/ezyang/3133/head 2025-09-07T06:13:36.9404763Z * [new branch] gh/ezyang/3133/orig -> origin/gh/ezyang/3133/orig 2025-09-07T06:13:36.9406399Z * [new branch] gh/ezyang/3134/base -> origin/gh/ezyang/3134/base 2025-09-07T06:13:36.9407635Z * [new branch] gh/ezyang/3134/head -> origin/gh/ezyang/3134/head 2025-09-07T06:13:36.9408658Z * [new branch] gh/ezyang/3134/orig -> origin/gh/ezyang/3134/orig 2025-09-07T06:13:36.9410266Z * [new branch] gh/ezyang/3135/base -> origin/gh/ezyang/3135/base 2025-09-07T06:13:36.9411400Z * [new branch] gh/ezyang/3135/head -> origin/gh/ezyang/3135/head 2025-09-07T06:13:36.9412577Z * [new branch] gh/ezyang/3135/orig -> origin/gh/ezyang/3135/orig 2025-09-07T06:13:36.9414517Z * [new branch] gh/ezyang/3136/base -> origin/gh/ezyang/3136/base 2025-09-07T06:13:36.9415691Z * [new branch] gh/ezyang/3136/head -> origin/gh/ezyang/3136/head 2025-09-07T06:13:36.9416808Z * [new branch] gh/ezyang/3136/orig -> origin/gh/ezyang/3136/orig 2025-09-07T06:13:36.9418442Z * [new branch] gh/ezyang/3137/base -> origin/gh/ezyang/3137/base 2025-09-07T06:13:36.9419654Z * [new branch] gh/ezyang/3137/head -> origin/gh/ezyang/3137/head 2025-09-07T06:13:36.9420804Z * [new branch] gh/ezyang/3137/orig -> origin/gh/ezyang/3137/orig 2025-09-07T06:13:36.9422355Z * [new branch] gh/ezyang/3138/base -> origin/gh/ezyang/3138/base 2025-09-07T06:13:36.9423472Z * [new branch] gh/ezyang/3138/head -> origin/gh/ezyang/3138/head 2025-09-07T06:13:36.9424736Z * [new branch] gh/ezyang/3138/orig -> origin/gh/ezyang/3138/orig 2025-09-07T06:13:36.9426383Z * [new branch] gh/ezyang/3139/base -> origin/gh/ezyang/3139/base 2025-09-07T06:13:36.9427496Z * [new branch] gh/ezyang/3139/head -> origin/gh/ezyang/3139/head 2025-09-07T06:13:36.9428631Z * [new branch] gh/ezyang/3139/orig -> origin/gh/ezyang/3139/orig 2025-09-07T06:13:36.9430188Z * [new branch] gh/ezyang/3140/base -> origin/gh/ezyang/3140/base 2025-09-07T06:13:36.9431283Z * [new branch] gh/ezyang/3140/head -> origin/gh/ezyang/3140/head 2025-09-07T06:13:36.9432445Z * [new branch] gh/ezyang/3140/orig -> origin/gh/ezyang/3140/orig 2025-09-07T06:13:36.9434033Z * [new branch] gh/ezyang/3141/base -> origin/gh/ezyang/3141/base 2025-09-07T06:13:36.9435078Z * [new branch] gh/ezyang/3141/head -> origin/gh/ezyang/3141/head 2025-09-07T06:13:36.9436244Z * [new branch] gh/ezyang/3141/orig -> origin/gh/ezyang/3141/orig 2025-09-07T06:13:36.9437814Z * [new branch] gh/ezyang/3142/base -> origin/gh/ezyang/3142/base 2025-09-07T06:13:36.9438909Z * [new branch] gh/ezyang/3142/head -> origin/gh/ezyang/3142/head 2025-09-07T06:13:36.9440034Z * [new branch] gh/ezyang/3142/orig -> origin/gh/ezyang/3142/orig 2025-09-07T06:13:36.9441608Z * [new branch] gh/ezyang/3143/base -> origin/gh/ezyang/3143/base 2025-09-07T06:13:36.9442726Z * [new branch] gh/ezyang/3143/head -> origin/gh/ezyang/3143/head 2025-09-07T06:13:36.9443834Z * [new branch] gh/ezyang/3143/orig -> origin/gh/ezyang/3143/orig 2025-09-07T06:13:36.9445659Z * [new branch] gh/fadara01/1/base -> origin/gh/fadara01/1/base 2025-09-07T06:13:36.9448382Z * [new branch] gh/fadara01/1/head -> origin/gh/fadara01/1/head 2025-09-07T06:13:36.9449559Z * [new branch] gh/fadara01/1/orig -> origin/gh/fadara01/1/orig 2025-09-07T06:13:36.9451621Z * [new branch] gh/fduwjj/171/base -> origin/gh/fduwjj/171/base 2025-09-07T06:13:36.9452935Z * [new branch] gh/fduwjj/171/head -> origin/gh/fduwjj/171/head 2025-09-07T06:13:36.9454371Z * [new branch] gh/fduwjj/171/orig -> origin/gh/fduwjj/171/orig 2025-09-07T06:13:36.9456321Z * [new branch] gh/fduwjj/175/base -> origin/gh/fduwjj/175/base 2025-09-07T06:13:36.9457711Z * [new branch] gh/fduwjj/175/head -> origin/gh/fduwjj/175/head 2025-09-07T06:13:36.9458896Z * [new branch] gh/fduwjj/175/orig -> origin/gh/fduwjj/175/orig 2025-09-07T06:13:36.9460543Z * [new branch] gh/fduwjj/176/base -> origin/gh/fduwjj/176/base 2025-09-07T06:13:36.9461687Z * [new branch] gh/fduwjj/176/head -> origin/gh/fduwjj/176/head 2025-09-07T06:13:36.9462846Z * [new branch] gh/fduwjj/176/orig -> origin/gh/fduwjj/176/orig 2025-09-07T06:13:36.9464398Z * [new branch] gh/fduwjj/177/base -> origin/gh/fduwjj/177/base 2025-09-07T06:13:36.9465755Z * [new branch] gh/fduwjj/177/head -> origin/gh/fduwjj/177/head 2025-09-07T06:13:36.9466886Z * [new branch] gh/fduwjj/177/orig -> origin/gh/fduwjj/177/orig 2025-09-07T06:13:36.9468485Z * [new branch] gh/fduwjj/178/base -> origin/gh/fduwjj/178/base 2025-09-07T06:13:36.9469696Z * [new branch] gh/fduwjj/178/head -> origin/gh/fduwjj/178/head 2025-09-07T06:13:36.9470808Z * [new branch] gh/fduwjj/178/orig -> origin/gh/fduwjj/178/orig 2025-09-07T06:13:36.9472339Z * [new branch] gh/fduwjj/179/base -> origin/gh/fduwjj/179/base 2025-09-07T06:13:36.9473480Z * [new branch] gh/fduwjj/179/head -> origin/gh/fduwjj/179/head 2025-09-07T06:13:36.9474603Z * [new branch] gh/fduwjj/179/orig -> origin/gh/fduwjj/179/orig 2025-09-07T06:13:36.9476186Z * [new branch] gh/fduwjj/180/base -> origin/gh/fduwjj/180/base 2025-09-07T06:13:36.9477357Z * [new branch] gh/fduwjj/180/head -> origin/gh/fduwjj/180/head 2025-09-07T06:13:36.9478456Z * [new branch] gh/fduwjj/180/orig -> origin/gh/fduwjj/180/orig 2025-09-07T06:13:36.9479997Z * [new branch] gh/fduwjj/181/base -> origin/gh/fduwjj/181/base 2025-09-07T06:13:36.9481137Z * [new branch] gh/fduwjj/181/head -> origin/gh/fduwjj/181/head 2025-09-07T06:13:36.9482256Z * [new branch] gh/fduwjj/181/orig -> origin/gh/fduwjj/181/orig 2025-09-07T06:13:36.9483793Z * [new branch] gh/fduwjj/182/base -> origin/gh/fduwjj/182/base 2025-09-07T06:13:36.9484934Z * [new branch] gh/fduwjj/182/head -> origin/gh/fduwjj/182/head 2025-09-07T06:13:36.9486083Z * [new branch] gh/fduwjj/182/orig -> origin/gh/fduwjj/182/orig 2025-09-07T06:13:36.9487676Z * [new branch] gh/fduwjj/183/base -> origin/gh/fduwjj/183/base 2025-09-07T06:13:36.9489041Z * [new branch] gh/fduwjj/183/head -> origin/gh/fduwjj/183/head 2025-09-07T06:13:36.9490167Z * [new branch] gh/fduwjj/183/orig -> origin/gh/fduwjj/183/orig 2025-09-07T06:13:36.9492196Z * [new branch] gh/fduwjj/184/base -> origin/gh/fduwjj/184/base 2025-09-07T06:13:36.9493826Z * [new branch] gh/fduwjj/184/head -> origin/gh/fduwjj/184/head 2025-09-07T06:13:36.9494942Z * [new branch] gh/fduwjj/184/orig -> origin/gh/fduwjj/184/orig 2025-09-07T06:13:36.9496702Z * [new branch] gh/fduwjj/185/base -> origin/gh/fduwjj/185/base 2025-09-07T06:13:36.9497810Z * [new branch] gh/fduwjj/185/head -> origin/gh/fduwjj/185/head 2025-09-07T06:13:36.9499029Z * [new branch] gh/fduwjj/185/orig -> origin/gh/fduwjj/185/orig 2025-09-07T06:13:36.9500466Z * [new branch] gh/fduwjj/186/base -> origin/gh/fduwjj/186/base 2025-09-07T06:13:36.9501613Z * [new branch] gh/fduwjj/186/head -> origin/gh/fduwjj/186/head 2025-09-07T06:13:36.9502808Z * [new branch] gh/fduwjj/186/orig -> origin/gh/fduwjj/186/orig 2025-09-07T06:13:36.9504326Z * [new branch] gh/fduwjj/187/base -> origin/gh/fduwjj/187/base 2025-09-07T06:13:36.9505476Z * [new branch] gh/fduwjj/187/head -> origin/gh/fduwjj/187/head 2025-09-07T06:13:36.9506645Z * [new branch] gh/fduwjj/187/orig -> origin/gh/fduwjj/187/orig 2025-09-07T06:13:36.9508018Z * [new branch] gh/fduwjj/188/base -> origin/gh/fduwjj/188/base 2025-09-07T06:13:36.9509149Z * [new branch] gh/fduwjj/188/head -> origin/gh/fduwjj/188/head 2025-09-07T06:13:36.9510203Z * [new branch] gh/fduwjj/188/orig -> origin/gh/fduwjj/188/orig 2025-09-07T06:13:36.9511575Z * [new branch] gh/fduwjj/189/base -> origin/gh/fduwjj/189/base 2025-09-07T06:13:36.9512716Z * [new branch] gh/fduwjj/189/head -> origin/gh/fduwjj/189/head 2025-09-07T06:13:36.9513796Z * [new branch] gh/fduwjj/189/orig -> origin/gh/fduwjj/189/orig 2025-09-07T06:13:36.9515687Z * [new branch] gh/fduwjj/190/base -> origin/gh/fduwjj/190/base 2025-09-07T06:13:36.9516853Z * [new branch] gh/fduwjj/190/head -> origin/gh/fduwjj/190/head 2025-09-07T06:13:36.9518063Z * [new branch] gh/fduwjj/190/orig -> origin/gh/fduwjj/190/orig 2025-09-07T06:13:36.9519464Z * [new branch] gh/fduwjj/191/base -> origin/gh/fduwjj/191/base 2025-09-07T06:13:36.9520751Z * [new branch] gh/fduwjj/191/head -> origin/gh/fduwjj/191/head 2025-09-07T06:13:36.9522029Z * [new branch] gh/fduwjj/191/orig -> origin/gh/fduwjj/191/orig 2025-09-07T06:13:36.9523839Z * [new branch] gh/fegin/306/base -> origin/gh/fegin/306/base 2025-09-07T06:13:36.9525068Z * [new branch] gh/fegin/306/head -> origin/gh/fegin/306/head 2025-09-07T06:13:36.9526111Z * [new branch] gh/fegin/306/orig -> origin/gh/fegin/306/orig 2025-09-07T06:13:36.9527650Z * [new branch] gh/fegin/307/base -> origin/gh/fegin/307/base 2025-09-07T06:13:36.9528740Z * [new branch] gh/fegin/307/head -> origin/gh/fegin/307/head 2025-09-07T06:13:36.9529842Z * [new branch] gh/fegin/307/orig -> origin/gh/fegin/307/orig 2025-09-07T06:13:36.9531404Z * [new branch] gh/fegin/308/base -> origin/gh/fegin/308/base 2025-09-07T06:13:36.9532533Z * [new branch] gh/fegin/308/head -> origin/gh/fegin/308/head 2025-09-07T06:13:36.9534145Z * [new branch] gh/fegin/308/orig -> origin/gh/fegin/308/orig 2025-09-07T06:13:36.9535679Z * [new branch] gh/fegin/309/base -> origin/gh/fegin/309/base 2025-09-07T06:13:36.9536901Z * [new branch] gh/fegin/309/head -> origin/gh/fegin/309/head 2025-09-07T06:13:36.9538071Z * [new branch] gh/fegin/309/orig -> origin/gh/fegin/309/orig 2025-09-07T06:13:36.9539625Z * [new branch] gh/fegin/310/base -> origin/gh/fegin/310/base 2025-09-07T06:13:36.9540774Z * [new branch] gh/fegin/310/head -> origin/gh/fegin/310/head 2025-09-07T06:13:36.9543008Z * [new branch] gh/fegin/310/orig -> origin/gh/fegin/310/orig 2025-09-07T06:13:36.9543895Z * [new branch] gh/fegin/311/base -> origin/gh/fegin/311/base 2025-09-07T06:13:36.9544769Z * [new branch] gh/fegin/311/head -> origin/gh/fegin/311/head 2025-09-07T06:13:36.9546334Z * [new branch] gh/fegin/311/orig -> origin/gh/fegin/311/orig 2025-09-07T06:13:36.9547617Z * [new branch] gh/fegin/312/base -> origin/gh/fegin/312/base 2025-09-07T06:13:36.9548736Z * [new branch] gh/fegin/312/head -> origin/gh/fegin/312/head 2025-09-07T06:13:36.9549892Z * [new branch] gh/fegin/312/orig -> origin/gh/fegin/312/orig 2025-09-07T06:13:36.9551514Z * [new branch] gh/fegin/313/base -> origin/gh/fegin/313/base 2025-09-07T06:13:36.9552489Z * [new branch] gh/fegin/313/head -> origin/gh/fegin/313/head 2025-09-07T06:13:36.9553659Z * [new branch] gh/fegin/313/orig -> origin/gh/fegin/313/orig 2025-09-07T06:13:36.9556612Z * [new branch] gh/fffrog/124/base -> origin/gh/fffrog/124/base 2025-09-07T06:13:36.9557271Z * [new branch] gh/fffrog/124/head -> origin/gh/fffrog/124/head 2025-09-07T06:13:36.9557800Z * [new branch] gh/fffrog/124/orig -> origin/gh/fffrog/124/orig 2025-09-07T06:13:36.9559323Z * [new branch] gh/fffrog/129/base -> origin/gh/fffrog/129/base 2025-09-07T06:13:36.9560456Z * [new branch] gh/fffrog/129/head -> origin/gh/fffrog/129/head 2025-09-07T06:13:36.9561688Z * [new branch] gh/fffrog/129/orig -> origin/gh/fffrog/129/orig 2025-09-07T06:13:36.9563186Z * [new branch] gh/fffrog/130/base -> origin/gh/fffrog/130/base 2025-09-07T06:13:36.9564327Z * [new branch] gh/fffrog/130/head -> origin/gh/fffrog/130/head 2025-09-07T06:13:36.9565507Z * [new branch] gh/fffrog/130/orig -> origin/gh/fffrog/130/orig 2025-09-07T06:13:36.9567041Z * [new branch] gh/fffrog/131/base -> origin/gh/fffrog/131/base 2025-09-07T06:13:36.9568156Z * [new branch] gh/fffrog/131/head -> origin/gh/fffrog/131/head 2025-09-07T06:13:36.9569332Z * [new branch] gh/fffrog/131/orig -> origin/gh/fffrog/131/orig 2025-09-07T06:13:36.9570854Z * [new branch] gh/fffrog/132/base -> origin/gh/fffrog/132/base 2025-09-07T06:13:36.9572006Z * [new branch] gh/fffrog/132/head -> origin/gh/fffrog/132/head 2025-09-07T06:13:36.9573535Z * [new branch] gh/fffrog/132/orig -> origin/gh/fffrog/132/orig 2025-09-07T06:13:36.9575139Z * [new branch] gh/fffrog/133/base -> origin/gh/fffrog/133/base 2025-09-07T06:13:36.9576347Z * [new branch] gh/fffrog/133/head -> origin/gh/fffrog/133/head 2025-09-07T06:13:36.9577447Z * [new branch] gh/fffrog/133/orig -> origin/gh/fffrog/133/orig 2025-09-07T06:13:36.9578985Z * [new branch] gh/fffrog/134/base -> origin/gh/fffrog/134/base 2025-09-07T06:13:36.9580133Z * [new branch] gh/fffrog/134/head -> origin/gh/fffrog/134/head 2025-09-07T06:13:36.9581317Z * [new branch] gh/fffrog/134/orig -> origin/gh/fffrog/134/orig 2025-09-07T06:13:36.9582895Z * [new branch] gh/fffrog/135/base -> origin/gh/fffrog/135/base 2025-09-07T06:13:36.9584122Z * [new branch] gh/fffrog/135/head -> origin/gh/fffrog/135/head 2025-09-07T06:13:36.9585365Z * [new branch] gh/fffrog/135/orig -> origin/gh/fffrog/135/orig 2025-09-07T06:13:36.9586931Z * [new branch] gh/fffrog/136/base -> origin/gh/fffrog/136/base 2025-09-07T06:13:36.9588054Z * [new branch] gh/fffrog/136/head -> origin/gh/fffrog/136/head 2025-09-07T06:13:36.9589284Z * [new branch] gh/fffrog/136/orig -> origin/gh/fffrog/136/orig 2025-09-07T06:13:36.9590814Z * [new branch] gh/fffrog/137/base -> origin/gh/fffrog/137/base 2025-09-07T06:13:36.9592044Z * [new branch] gh/fffrog/137/head -> origin/gh/fffrog/137/head 2025-09-07T06:13:36.9593536Z * [new branch] gh/fffrog/137/orig -> origin/gh/fffrog/137/orig 2025-09-07T06:13:36.9595133Z * [new branch] gh/fffrog/138/base -> origin/gh/fffrog/138/base 2025-09-07T06:13:36.9596268Z * [new branch] gh/fffrog/138/head -> origin/gh/fffrog/138/head 2025-09-07T06:13:36.9597812Z * [new branch] gh/fffrog/138/orig -> origin/gh/fffrog/138/orig 2025-09-07T06:13:36.9599071Z * [new branch] gh/fffrog/139/base -> origin/gh/fffrog/139/base 2025-09-07T06:13:36.9600227Z * [new branch] gh/fffrog/139/head -> origin/gh/fffrog/139/head 2025-09-07T06:13:36.9601458Z * [new branch] gh/fffrog/139/orig -> origin/gh/fffrog/139/orig 2025-09-07T06:13:36.9603062Z * [new branch] gh/fffrog/140/base -> origin/gh/fffrog/140/base 2025-09-07T06:13:36.9604320Z * [new branch] gh/fffrog/140/head -> origin/gh/fffrog/140/head 2025-09-07T06:13:36.9605362Z * [new branch] gh/fffrog/140/orig -> origin/gh/fffrog/140/orig 2025-09-07T06:13:36.9606908Z * [new branch] gh/fffrog/141/base -> origin/gh/fffrog/141/base 2025-09-07T06:13:36.9608006Z * [new branch] gh/fffrog/141/head -> origin/gh/fffrog/141/head 2025-09-07T06:13:36.9609094Z * [new branch] gh/fffrog/141/orig -> origin/gh/fffrog/141/orig 2025-09-07T06:13:36.9610590Z * [new branch] gh/fffrog/142/base -> origin/gh/fffrog/142/base 2025-09-07T06:13:36.9611734Z * [new branch] gh/fffrog/142/head -> origin/gh/fffrog/142/head 2025-09-07T06:13:36.9612904Z * [new branch] gh/fffrog/142/orig -> origin/gh/fffrog/142/orig 2025-09-07T06:13:36.9614796Z * [new branch] gh/fffrog/143/base -> origin/gh/fffrog/143/base 2025-09-07T06:13:36.9615974Z * [new branch] gh/fffrog/143/head -> origin/gh/fffrog/143/head 2025-09-07T06:13:36.9617161Z * [new branch] gh/fffrog/143/orig -> origin/gh/fffrog/143/orig 2025-09-07T06:13:36.9619127Z * [new branch] gh/fffrog/144/base -> origin/gh/fffrog/144/base 2025-09-07T06:13:36.9620293Z * [new branch] gh/fffrog/144/head -> origin/gh/fffrog/144/head 2025-09-07T06:13:36.9621595Z * [new branch] gh/fffrog/144/orig -> origin/gh/fffrog/144/orig 2025-09-07T06:13:36.9623124Z * [new branch] gh/fffrog/145/base -> origin/gh/fffrog/145/base 2025-09-07T06:13:36.9624359Z * [new branch] gh/fffrog/145/head -> origin/gh/fffrog/145/head 2025-09-07T06:13:36.9625652Z * [new branch] gh/fffrog/145/orig -> origin/gh/fffrog/145/orig 2025-09-07T06:13:36.9627152Z * [new branch] gh/fffrog/146/base -> origin/gh/fffrog/146/base 2025-09-07T06:13:36.9628335Z * [new branch] gh/fffrog/146/head -> origin/gh/fffrog/146/head 2025-09-07T06:13:36.9629471Z * [new branch] gh/fffrog/146/orig -> origin/gh/fffrog/146/orig 2025-09-07T06:13:36.9631053Z * [new branch] gh/fffrog/147/base -> origin/gh/fffrog/147/base 2025-09-07T06:13:36.9632169Z * [new branch] gh/fffrog/147/head -> origin/gh/fffrog/147/head 2025-09-07T06:13:36.9633342Z * [new branch] gh/fffrog/147/orig -> origin/gh/fffrog/147/orig 2025-09-07T06:13:36.9634879Z * [new branch] gh/fffrog/148/base -> origin/gh/fffrog/148/base 2025-09-07T06:13:36.9636014Z * [new branch] gh/fffrog/148/head -> origin/gh/fffrog/148/head 2025-09-07T06:13:36.9663705Z * [new branch] gh/fffrog/148/orig -> origin/gh/fffrog/148/orig 2025-09-07T06:13:36.9664240Z * [new branch] gh/fffrog/149/base -> origin/gh/fffrog/149/base 2025-09-07T06:13:36.9664488Z * [new branch] gh/fffrog/149/head -> origin/gh/fffrog/149/head 2025-09-07T06:13:36.9664722Z * [new branch] gh/fffrog/149/orig -> origin/gh/fffrog/149/orig 2025-09-07T06:13:36.9664952Z * [new branch] gh/fffrog/150/base -> origin/gh/fffrog/150/base 2025-09-07T06:13:36.9665298Z * [new branch] gh/fffrog/150/head -> origin/gh/fffrog/150/head 2025-09-07T06:13:36.9665652Z * [new branch] gh/fffrog/150/orig -> origin/gh/fffrog/150/orig 2025-09-07T06:13:36.9665876Z * [new branch] gh/fffrog/151/base -> origin/gh/fffrog/151/base 2025-09-07T06:13:36.9666119Z * [new branch] gh/fffrog/151/head -> origin/gh/fffrog/151/head 2025-09-07T06:13:36.9666342Z * [new branch] gh/fffrog/151/orig -> origin/gh/fffrog/151/orig 2025-09-07T06:13:36.9666568Z * [new branch] gh/fffrog/152/base -> origin/gh/fffrog/152/base 2025-09-07T06:13:36.9666805Z * [new branch] gh/fffrog/152/head -> origin/gh/fffrog/152/head 2025-09-07T06:13:36.9667028Z * [new branch] gh/fffrog/153/base -> origin/gh/fffrog/153/base 2025-09-07T06:13:36.9667249Z * [new branch] gh/fffrog/153/head -> origin/gh/fffrog/153/head 2025-09-07T06:13:36.9667482Z * [new branch] gh/fffrog/153/orig -> origin/gh/fffrog/153/orig 2025-09-07T06:13:36.9667731Z * [new branch] gh/gmagogsfm/1/base -> origin/gh/gmagogsfm/1/base 2025-09-07T06:13:36.9667970Z * [new branch] gh/gmagogsfm/1/head -> origin/gh/gmagogsfm/1/head 2025-09-07T06:13:36.9668277Z * [new branch] gh/gmagogsfm/1/orig -> origin/gh/gmagogsfm/1/orig 2025-09-07T06:13:36.9668509Z * [new branch] gh/gmagogsfm/2/base -> origin/gh/gmagogsfm/2/base 2025-09-07T06:13:36.9668740Z * [new branch] gh/gmagogsfm/2/head -> origin/gh/gmagogsfm/2/head 2025-09-07T06:13:36.9668986Z * [new branch] gh/gmagogsfm/2/orig -> origin/gh/gmagogsfm/2/orig 2025-09-07T06:13:36.9669217Z * [new branch] gh/gmagogsfm/3/base -> origin/gh/gmagogsfm/3/base 2025-09-07T06:13:36.9669450Z * [new branch] gh/gmagogsfm/3/head -> origin/gh/gmagogsfm/3/head 2025-09-07T06:13:36.9669690Z * [new branch] gh/gmagogsfm/3/orig -> origin/gh/gmagogsfm/3/orig 2025-09-07T06:13:36.9670392Z * [new branch] gh/guangyey/134/base -> origin/gh/guangyey/134/base 2025-09-07T06:13:36.9671566Z * [new branch] gh/guangyey/134/head -> origin/gh/guangyey/134/head 2025-09-07T06:13:36.9672730Z * [new branch] gh/guangyey/134/orig -> origin/gh/guangyey/134/orig 2025-09-07T06:13:36.9674252Z * [new branch] gh/guangyey/135/base -> origin/gh/guangyey/135/base 2025-09-07T06:13:36.9675378Z * [new branch] gh/guangyey/135/head -> origin/gh/guangyey/135/head 2025-09-07T06:13:36.9676547Z * [new branch] gh/guangyey/135/orig -> origin/gh/guangyey/135/orig 2025-09-07T06:13:36.9678091Z * [new branch] gh/guangyey/139/base -> origin/gh/guangyey/139/base 2025-09-07T06:13:36.9679239Z * [new branch] gh/guangyey/139/head -> origin/gh/guangyey/139/head 2025-09-07T06:13:36.9680359Z * [new branch] gh/guangyey/139/orig -> origin/gh/guangyey/139/orig 2025-09-07T06:13:36.9681879Z * [new branch] gh/guangyey/140/base -> origin/gh/guangyey/140/base 2025-09-07T06:13:36.9682984Z * [new branch] gh/guangyey/140/head -> origin/gh/guangyey/140/head 2025-09-07T06:13:36.9684105Z * [new branch] gh/guangyey/140/orig -> origin/gh/guangyey/140/orig 2025-09-07T06:13:36.9685601Z * [new branch] gh/guangyey/142/base -> origin/gh/guangyey/142/base 2025-09-07T06:13:36.9686734Z * [new branch] gh/guangyey/142/head -> origin/gh/guangyey/142/head 2025-09-07T06:13:36.9687848Z * [new branch] gh/guangyey/142/orig -> origin/gh/guangyey/142/orig 2025-09-07T06:13:36.9689414Z * [new branch] gh/guangyey/145/base -> origin/gh/guangyey/145/base 2025-09-07T06:13:36.9690565Z * [new branch] gh/guangyey/145/head -> origin/gh/guangyey/145/head 2025-09-07T06:13:36.9691774Z * [new branch] gh/guangyey/145/orig -> origin/gh/guangyey/145/orig 2025-09-07T06:13:36.9693863Z * [new branch] gh/guangyey/153/base -> origin/gh/guangyey/153/base 2025-09-07T06:13:36.9695033Z * [new branch] gh/guangyey/153/head -> origin/gh/guangyey/153/head 2025-09-07T06:13:36.9696213Z * [new branch] gh/guangyey/153/orig -> origin/gh/guangyey/153/orig 2025-09-07T06:13:36.9697809Z * [new branch] gh/guangyey/159/base -> origin/gh/guangyey/159/base 2025-09-07T06:13:36.9698961Z * [new branch] gh/guangyey/159/head -> origin/gh/guangyey/159/head 2025-09-07T06:13:36.9700149Z * [new branch] gh/guangyey/159/orig -> origin/gh/guangyey/159/orig 2025-09-07T06:13:36.9701740Z * [new branch] gh/guangyey/163/base -> origin/gh/guangyey/163/base 2025-09-07T06:13:36.9703007Z * [new branch] gh/guangyey/163/head -> origin/gh/guangyey/163/head 2025-09-07T06:13:36.9704154Z * [new branch] gh/guangyey/163/orig -> origin/gh/guangyey/163/orig 2025-09-07T06:13:36.9705864Z * [new branch] gh/guangyey/168/base -> origin/gh/guangyey/168/base 2025-09-07T06:13:36.9706993Z * [new branch] gh/guangyey/168/head -> origin/gh/guangyey/168/head 2025-09-07T06:13:36.9708153Z * [new branch] gh/guangyey/168/orig -> origin/gh/guangyey/168/orig 2025-09-07T06:13:36.9709697Z * [new branch] gh/guangyey/169/base -> origin/gh/guangyey/169/base 2025-09-07T06:13:36.9710817Z * [new branch] gh/guangyey/169/head -> origin/gh/guangyey/169/head 2025-09-07T06:13:36.9711971Z * [new branch] gh/guangyey/169/orig -> origin/gh/guangyey/169/orig 2025-09-07T06:13:36.9713528Z * [new branch] gh/guangyey/170/base -> origin/gh/guangyey/170/base 2025-09-07T06:13:36.9714651Z * [new branch] gh/guangyey/170/head -> origin/gh/guangyey/170/head 2025-09-07T06:13:36.9715847Z * [new branch] gh/guangyey/170/orig -> origin/gh/guangyey/170/orig 2025-09-07T06:13:36.9717478Z * [new branch] gh/guangyey/171/base -> origin/gh/guangyey/171/base 2025-09-07T06:13:36.9718592Z * [new branch] gh/guangyey/171/head -> origin/gh/guangyey/171/head 2025-09-07T06:13:36.9719712Z * [new branch] gh/guangyey/171/orig -> origin/gh/guangyey/171/orig 2025-09-07T06:13:36.9721260Z * [new branch] gh/guangyey/174/base -> origin/gh/guangyey/174/base 2025-09-07T06:13:36.9722355Z * [new branch] gh/guangyey/174/head -> origin/gh/guangyey/174/head 2025-09-07T06:13:36.9723517Z * [new branch] gh/guangyey/174/orig -> origin/gh/guangyey/174/orig 2025-09-07T06:13:36.9725018Z * [new branch] gh/guangyey/176/base -> origin/gh/guangyey/176/base 2025-09-07T06:13:36.9726315Z * [new branch] gh/guangyey/176/head -> origin/gh/guangyey/176/head 2025-09-07T06:13:36.9727467Z * [new branch] gh/guangyey/176/orig -> origin/gh/guangyey/176/orig 2025-09-07T06:13:36.9729000Z * [new branch] gh/guangyey/178/base -> origin/gh/guangyey/178/base 2025-09-07T06:13:36.9730121Z * [new branch] gh/guangyey/178/head -> origin/gh/guangyey/178/head 2025-09-07T06:13:36.9731321Z * [new branch] gh/guangyey/178/orig -> origin/gh/guangyey/178/orig 2025-09-07T06:13:36.9733544Z * [new branch] gh/guangyey/181/base -> origin/gh/guangyey/181/base 2025-09-07T06:13:36.9734733Z * [new branch] gh/guangyey/181/head -> origin/gh/guangyey/181/head 2025-09-07T06:13:36.9735932Z * [new branch] gh/guangyey/181/orig -> origin/gh/guangyey/181/orig 2025-09-07T06:13:36.9737560Z * [new branch] gh/guangyey/182/base -> origin/gh/guangyey/182/base 2025-09-07T06:13:36.9738836Z * [new branch] gh/guangyey/182/head -> origin/gh/guangyey/182/head 2025-09-07T06:13:36.9739920Z * [new branch] gh/guangyey/182/orig -> origin/gh/guangyey/182/orig 2025-09-07T06:13:36.9741435Z * [new branch] gh/guangyey/183/base -> origin/gh/guangyey/183/base 2025-09-07T06:13:36.9742741Z * [new branch] gh/guangyey/183/head -> origin/gh/guangyey/183/head 2025-09-07T06:13:36.9743987Z * [new branch] gh/guangyey/183/orig -> origin/gh/guangyey/183/orig 2025-09-07T06:13:36.9745681Z * [new branch] gh/guangyey/184/base -> origin/gh/guangyey/184/base 2025-09-07T06:13:36.9746826Z * [new branch] gh/guangyey/184/head -> origin/gh/guangyey/184/head 2025-09-07T06:13:36.9747950Z * [new branch] gh/guangyey/184/orig -> origin/gh/guangyey/184/orig 2025-09-07T06:13:36.9749531Z * [new branch] gh/guangyey/185/base -> origin/gh/guangyey/185/base 2025-09-07T06:13:36.9750676Z * [new branch] gh/guangyey/185/head -> origin/gh/guangyey/185/head 2025-09-07T06:13:36.9751829Z * [new branch] gh/guangyey/185/orig -> origin/gh/guangyey/185/orig 2025-09-07T06:13:36.9753400Z * [new branch] gh/guangyey/186/base -> origin/gh/guangyey/186/base 2025-09-07T06:13:36.9754555Z * [new branch] gh/guangyey/186/head -> origin/gh/guangyey/186/head 2025-09-07T06:13:36.9755649Z * [new branch] gh/guangyey/186/orig -> origin/gh/guangyey/186/orig 2025-09-07T06:13:36.9757194Z * [new branch] gh/guangyey/187/base -> origin/gh/guangyey/187/base 2025-09-07T06:13:36.9758324Z * [new branch] gh/guangyey/187/head -> origin/gh/guangyey/187/head 2025-09-07T06:13:36.9759438Z * [new branch] gh/guangyey/187/orig -> origin/gh/guangyey/187/orig 2025-09-07T06:13:36.9761022Z * [new branch] gh/guangyey/188/base -> origin/gh/guangyey/188/base 2025-09-07T06:13:36.9762148Z * [new branch] gh/guangyey/188/head -> origin/gh/guangyey/188/head 2025-09-07T06:13:36.9763330Z * [new branch] gh/guangyey/188/orig -> origin/gh/guangyey/188/orig 2025-09-07T06:13:36.9764908Z * [new branch] gh/guangyey/189/base -> origin/gh/guangyey/189/base 2025-09-07T06:13:36.9766029Z * [new branch] gh/guangyey/189/head -> origin/gh/guangyey/189/head 2025-09-07T06:13:36.9767166Z * [new branch] gh/guangyey/189/orig -> origin/gh/guangyey/189/orig 2025-09-07T06:13:36.9768728Z * [new branch] gh/guangyey/190/base -> origin/gh/guangyey/190/base 2025-09-07T06:13:36.9769864Z * [new branch] gh/guangyey/190/head -> origin/gh/guangyey/190/head 2025-09-07T06:13:36.9771040Z * [new branch] gh/guangyey/190/orig -> origin/gh/guangyey/190/orig 2025-09-07T06:13:36.9772583Z * [new branch] gh/guangyey/191/base -> origin/gh/guangyey/191/base 2025-09-07T06:13:36.9774058Z * [new branch] gh/guangyey/191/head -> origin/gh/guangyey/191/head 2025-09-07T06:13:36.9775213Z * [new branch] gh/guangyey/191/orig -> origin/gh/guangyey/191/orig 2025-09-07T06:13:36.9776872Z * [new branch] gh/guangyey/192/base -> origin/gh/guangyey/192/base 2025-09-07T06:13:36.9778033Z * [new branch] gh/guangyey/192/head -> origin/gh/guangyey/192/head 2025-09-07T06:13:36.9779197Z * [new branch] gh/guangyey/192/orig -> origin/gh/guangyey/192/orig 2025-09-07T06:13:36.9780911Z * [new branch] gh/guangyey/193/base -> origin/gh/guangyey/193/base 2025-09-07T06:13:36.9782077Z * [new branch] gh/guangyey/193/head -> origin/gh/guangyey/193/head 2025-09-07T06:13:36.9783233Z * [new branch] gh/guangyey/193/orig -> origin/gh/guangyey/193/orig 2025-09-07T06:13:36.9785046Z * [new branch] gh/guangyey/194/base -> origin/gh/guangyey/194/base 2025-09-07T06:13:36.9786141Z * [new branch] gh/guangyey/194/head -> origin/gh/guangyey/194/head 2025-09-07T06:13:36.9787253Z * [new branch] gh/guangyey/194/orig -> origin/gh/guangyey/194/orig 2025-09-07T06:13:36.9788813Z * [new branch] gh/guangyey/195/base -> origin/gh/guangyey/195/base 2025-09-07T06:13:36.9790080Z * [new branch] gh/guangyey/195/head -> origin/gh/guangyey/195/head 2025-09-07T06:13:36.9791203Z * [new branch] gh/guangyey/195/orig -> origin/gh/guangyey/195/orig 2025-09-07T06:13:36.9794542Z * [new branch] gh/guangyey/196/base -> origin/gh/guangyey/196/base 2025-09-07T06:13:36.9795928Z * [new branch] gh/guangyey/196/head -> origin/gh/guangyey/196/head 2025-09-07T06:13:36.9797153Z * [new branch] gh/guangyey/196/orig -> origin/gh/guangyey/196/orig 2025-09-07T06:13:36.9798833Z * [new branch] gh/guangyey/197/base -> origin/gh/guangyey/197/base 2025-09-07T06:13:36.9800042Z * [new branch] gh/guangyey/197/head -> origin/gh/guangyey/197/head 2025-09-07T06:13:36.9801222Z * [new branch] gh/guangyey/197/orig -> origin/gh/guangyey/197/orig 2025-09-07T06:13:36.9802880Z * [new branch] gh/guangyey/198/base -> origin/gh/guangyey/198/base 2025-09-07T06:13:36.9804060Z * [new branch] gh/guangyey/198/head -> origin/gh/guangyey/198/head 2025-09-07T06:13:36.9805384Z * [new branch] gh/guangyey/198/orig -> origin/gh/guangyey/198/orig 2025-09-07T06:13:36.9807119Z * [new branch] gh/guangyey/199/base -> origin/gh/guangyey/199/base 2025-09-07T06:13:36.9808121Z * [new branch] gh/guangyey/199/head -> origin/gh/guangyey/199/head 2025-09-07T06:13:36.9809242Z * [new branch] gh/guangyey/199/orig -> origin/gh/guangyey/199/orig 2025-09-07T06:13:36.9810817Z * [new branch] gh/guangyey/200/base -> origin/gh/guangyey/200/base 2025-09-07T06:13:36.9811910Z * [new branch] gh/guangyey/200/head -> origin/gh/guangyey/200/head 2025-09-07T06:13:36.9813331Z * [new branch] gh/guangyey/200/orig -> origin/gh/guangyey/200/orig 2025-09-07T06:13:36.9815098Z * [new branch] gh/guangyey/201/base -> origin/gh/guangyey/201/base 2025-09-07T06:13:36.9816230Z * [new branch] gh/guangyey/201/head -> origin/gh/guangyey/201/head 2025-09-07T06:13:36.9817450Z * [new branch] gh/guangyey/201/orig -> origin/gh/guangyey/201/orig 2025-09-07T06:13:36.9819059Z * [new branch] gh/guangyey/202/base -> origin/gh/guangyey/202/base 2025-09-07T06:13:36.9820440Z * [new branch] gh/guangyey/202/head -> origin/gh/guangyey/202/head 2025-09-07T06:13:36.9821436Z * [new branch] gh/guangyey/202/orig -> origin/gh/guangyey/202/orig 2025-09-07T06:13:36.9823038Z * [new branch] gh/guangyey/203/base -> origin/gh/guangyey/203/base 2025-09-07T06:13:36.9824165Z * [new branch] gh/guangyey/203/head -> origin/gh/guangyey/203/head 2025-09-07T06:13:36.9825429Z * [new branch] gh/guangyey/203/orig -> origin/gh/guangyey/203/orig 2025-09-07T06:13:36.9826990Z * [new branch] gh/guangyey/204/base -> origin/gh/guangyey/204/base 2025-09-07T06:13:36.9828138Z * [new branch] gh/guangyey/204/head -> origin/gh/guangyey/204/head 2025-09-07T06:13:36.9829293Z * [new branch] gh/guangyey/204/orig -> origin/gh/guangyey/204/orig 2025-09-07T06:13:36.9830843Z * [new branch] gh/guangyey/205/base -> origin/gh/guangyey/205/base 2025-09-07T06:13:36.9831993Z * [new branch] gh/guangyey/205/head -> origin/gh/guangyey/205/head 2025-09-07T06:13:36.9833241Z * [new branch] gh/guangyey/205/orig -> origin/gh/guangyey/205/orig 2025-09-07T06:13:36.9834674Z * [new branch] gh/guangyey/206/base -> origin/gh/guangyey/206/base 2025-09-07T06:13:36.9835791Z * [new branch] gh/guangyey/206/head -> origin/gh/guangyey/206/head 2025-09-07T06:13:36.9836927Z * [new branch] gh/guangyey/206/orig -> origin/gh/guangyey/206/orig 2025-09-07T06:13:36.9838478Z * [new branch] gh/guangyey/207/base -> origin/gh/guangyey/207/base 2025-09-07T06:13:36.9839620Z * [new branch] gh/guangyey/207/head -> origin/gh/guangyey/207/head 2025-09-07T06:13:36.9840712Z * [new branch] gh/guangyey/207/orig -> origin/gh/guangyey/207/orig 2025-09-07T06:13:36.9842274Z * [new branch] gh/guangyey/79/base -> origin/gh/guangyey/79/base 2025-09-07T06:13:36.9843416Z * [new branch] gh/guangyey/79/head -> origin/gh/guangyey/79/head 2025-09-07T06:13:36.9844533Z * [new branch] gh/guangyey/79/orig -> origin/gh/guangyey/79/orig 2025-09-07T06:13:36.9846025Z * [new branch] gh/guangyey/89/base -> origin/gh/guangyey/89/base 2025-09-07T06:13:36.9847219Z * [new branch] gh/guangyey/89/head -> origin/gh/guangyey/89/head 2025-09-07T06:13:36.9848332Z * [new branch] gh/guangyey/89/orig -> origin/gh/guangyey/89/orig 2025-09-07T06:13:36.9850295Z * [new branch] gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base 2025-09-07T06:13:36.9851443Z * [new branch] gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head 2025-09-07T06:13:36.9852671Z * [new branch] gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig 2025-09-07T06:13:36.9854420Z * [new branch] gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base 2025-09-07T06:13:36.9855584Z * [new branch] gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head 2025-09-07T06:13:36.9856908Z * [new branch] gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig 2025-09-07T06:13:36.9858403Z * [new branch] gh/guilhermeleobas/124/base -> origin/gh/guilhermeleobas/124/base 2025-09-07T06:13:36.9859550Z * [new branch] gh/guilhermeleobas/124/head -> origin/gh/guilhermeleobas/124/head 2025-09-07T06:13:36.9860811Z * [new branch] gh/guilhermeleobas/124/orig -> origin/gh/guilhermeleobas/124/orig 2025-09-07T06:13:36.9862378Z * [new branch] gh/guilhermeleobas/147/base -> origin/gh/guilhermeleobas/147/base 2025-09-07T06:13:36.9863740Z * [new branch] gh/guilhermeleobas/147/head -> origin/gh/guilhermeleobas/147/head 2025-09-07T06:13:36.9865100Z * [new branch] gh/guilhermeleobas/147/orig -> origin/gh/guilhermeleobas/147/orig 2025-09-07T06:13:36.9866684Z * [new branch] gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base 2025-09-07T06:13:36.9867807Z * [new branch] gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head 2025-09-07T06:13:36.9868933Z * [new branch] gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig 2025-09-07T06:13:36.9870570Z * [new branch] gh/guilhermeleobas/163/base -> origin/gh/guilhermeleobas/163/base 2025-09-07T06:13:36.9871670Z * [new branch] gh/guilhermeleobas/163/head -> origin/gh/guilhermeleobas/163/head 2025-09-07T06:13:36.9872810Z * [new branch] gh/guilhermeleobas/163/orig -> origin/gh/guilhermeleobas/163/orig 2025-09-07T06:13:36.9874357Z * [new branch] gh/guilhermeleobas/164/base -> origin/gh/guilhermeleobas/164/base 2025-09-07T06:13:36.9875489Z * [new branch] gh/guilhermeleobas/164/head -> origin/gh/guilhermeleobas/164/head 2025-09-07T06:13:36.9876647Z * [new branch] gh/guilhermeleobas/164/orig -> origin/gh/guilhermeleobas/164/orig 2025-09-07T06:13:36.9878237Z * [new branch] gh/guilhermeleobas/165/base -> origin/gh/guilhermeleobas/165/base 2025-09-07T06:13:36.9879287Z * [new branch] gh/guilhermeleobas/165/head -> origin/gh/guilhermeleobas/165/head 2025-09-07T06:13:36.9880414Z * [new branch] gh/guilhermeleobas/165/orig -> origin/gh/guilhermeleobas/165/orig 2025-09-07T06:13:36.9882148Z * [new branch] gh/guilhermeleobas/166/base -> origin/gh/guilhermeleobas/166/base 2025-09-07T06:13:36.9883120Z * [new branch] gh/guilhermeleobas/166/head -> origin/gh/guilhermeleobas/166/head 2025-09-07T06:13:36.9884279Z * [new branch] gh/guilhermeleobas/166/orig -> origin/gh/guilhermeleobas/166/orig 2025-09-07T06:13:36.9885785Z * [new branch] gh/guilhermeleobas/167/base -> origin/gh/guilhermeleobas/167/base 2025-09-07T06:13:36.9886922Z * [new branch] gh/guilhermeleobas/167/head -> origin/gh/guilhermeleobas/167/head 2025-09-07T06:13:36.9888095Z * [new branch] gh/guilhermeleobas/167/orig -> origin/gh/guilhermeleobas/167/orig 2025-09-07T06:13:36.9889603Z * [new branch] gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base 2025-09-07T06:13:36.9890745Z * [new branch] gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head 2025-09-07T06:13:36.9891996Z * [new branch] gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig 2025-09-07T06:13:36.9894082Z * [new branch] gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base 2025-09-07T06:13:36.9895304Z * [new branch] gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head 2025-09-07T06:13:36.9896482Z * [new branch] gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig 2025-09-07T06:13:36.9898017Z * [new branch] gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base 2025-09-07T06:13:36.9899353Z * [new branch] gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head 2025-09-07T06:13:36.9900603Z * [new branch] gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig 2025-09-07T06:13:36.9902154Z * [new branch] gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base 2025-09-07T06:13:36.9903365Z * [new branch] gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head 2025-09-07T06:13:36.9904600Z * [new branch] gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig 2025-09-07T06:13:36.9906128Z * [new branch] gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base 2025-09-07T06:13:36.9907238Z * [new branch] gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head 2025-09-07T06:13:36.9908418Z * [new branch] gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig 2025-09-07T06:13:36.9909901Z * [new branch] gh/guilhermeleobas/192/base -> origin/gh/guilhermeleobas/192/base 2025-09-07T06:13:36.9911102Z * [new branch] gh/guilhermeleobas/192/head -> origin/gh/guilhermeleobas/192/head 2025-09-07T06:13:36.9912226Z * [new branch] gh/guilhermeleobas/192/orig -> origin/gh/guilhermeleobas/192/orig 2025-09-07T06:13:36.9914222Z * [new branch] gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base 2025-09-07T06:13:36.9915356Z * [new branch] gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head 2025-09-07T06:13:36.9916557Z * [new branch] gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig 2025-09-07T06:13:36.9918133Z * [new branch] gh/guilhermeleobas/194/base -> origin/gh/guilhermeleobas/194/base 2025-09-07T06:13:36.9919449Z * [new branch] gh/guilhermeleobas/194/head -> origin/gh/guilhermeleobas/194/head 2025-09-07T06:13:36.9920572Z * [new branch] gh/guilhermeleobas/194/orig -> origin/gh/guilhermeleobas/194/orig 2025-09-07T06:13:36.9922375Z * [new branch] gh/guilhermeleobas/203/base -> origin/gh/guilhermeleobas/203/base 2025-09-07T06:13:36.9923233Z * [new branch] gh/guilhermeleobas/203/head -> origin/gh/guilhermeleobas/203/head 2025-09-07T06:13:36.9924440Z * [new branch] gh/guilhermeleobas/203/orig -> origin/gh/guilhermeleobas/203/orig 2025-09-07T06:13:36.9925940Z * [new branch] gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base 2025-09-07T06:13:36.9927098Z * [new branch] gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head 2025-09-07T06:13:36.9928243Z * [new branch] gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig 2025-09-07T06:13:36.9929918Z * [new branch] gh/guilhermeleobas/205/base -> origin/gh/guilhermeleobas/205/base 2025-09-07T06:13:36.9931084Z * [new branch] gh/guilhermeleobas/205/head -> origin/gh/guilhermeleobas/205/head 2025-09-07T06:13:36.9932288Z * [new branch] gh/guilhermeleobas/205/orig -> origin/gh/guilhermeleobas/205/orig 2025-09-07T06:13:36.9934221Z * [new branch] gh/guilhermeleobas/209/base -> origin/gh/guilhermeleobas/209/base 2025-09-07T06:13:36.9935376Z * [new branch] gh/guilhermeleobas/209/head -> origin/gh/guilhermeleobas/209/head 2025-09-07T06:13:36.9936619Z * [new branch] gh/guilhermeleobas/209/orig -> origin/gh/guilhermeleobas/209/orig 2025-09-07T06:13:36.9938242Z * [new branch] gh/guilhermeleobas/210/base -> origin/gh/guilhermeleobas/210/base 2025-09-07T06:13:36.9939456Z * [new branch] gh/guilhermeleobas/210/head -> origin/gh/guilhermeleobas/210/head 2025-09-07T06:13:36.9940616Z * [new branch] gh/guilhermeleobas/210/orig -> origin/gh/guilhermeleobas/210/orig 2025-09-07T06:13:36.9942286Z * [new branch] gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base 2025-09-07T06:13:36.9943468Z * [new branch] gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head 2025-09-07T06:13:36.9944631Z * [new branch] gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig 2025-09-07T06:13:36.9946385Z * [new branch] gh/guilhermeleobas/214/base -> origin/gh/guilhermeleobas/214/base 2025-09-07T06:13:36.9947542Z * [new branch] gh/guilhermeleobas/214/head -> origin/gh/guilhermeleobas/214/head 2025-09-07T06:13:36.9948635Z * [new branch] gh/guilhermeleobas/214/orig -> origin/gh/guilhermeleobas/214/orig 2025-09-07T06:13:36.9950202Z * [new branch] gh/guilhermeleobas/215/base -> origin/gh/guilhermeleobas/215/base 2025-09-07T06:13:36.9951340Z * [new branch] gh/guilhermeleobas/215/head -> origin/gh/guilhermeleobas/215/head 2025-09-07T06:13:36.9952527Z * [new branch] gh/guilhermeleobas/215/orig -> origin/gh/guilhermeleobas/215/orig 2025-09-07T06:13:36.9954094Z * [new branch] gh/guilhermeleobas/216/base -> origin/gh/guilhermeleobas/216/base 2025-09-07T06:13:36.9955261Z * [new branch] gh/guilhermeleobas/216/head -> origin/gh/guilhermeleobas/216/head 2025-09-07T06:13:36.9956369Z * [new branch] gh/guilhermeleobas/216/orig -> origin/gh/guilhermeleobas/216/orig 2025-09-07T06:13:36.9957988Z * [new branch] gh/guilhermeleobas/217/base -> origin/gh/guilhermeleobas/217/base 2025-09-07T06:13:36.9959127Z * [new branch] gh/guilhermeleobas/217/head -> origin/gh/guilhermeleobas/217/head 2025-09-07T06:13:36.9960267Z * [new branch] gh/guilhermeleobas/217/orig -> origin/gh/guilhermeleobas/217/orig 2025-09-07T06:13:36.9961804Z * [new branch] gh/guilhermeleobas/219/base -> origin/gh/guilhermeleobas/219/base 2025-09-07T06:13:36.9962912Z * [new branch] gh/guilhermeleobas/219/head -> origin/gh/guilhermeleobas/219/head 2025-09-07T06:13:36.9964038Z * [new branch] gh/guilhermeleobas/219/orig -> origin/gh/guilhermeleobas/219/orig 2025-09-07T06:13:36.9965718Z * [new branch] gh/guilhermeleobas/220/base -> origin/gh/guilhermeleobas/220/base 2025-09-07T06:13:36.9966777Z * [new branch] gh/guilhermeleobas/220/head -> origin/gh/guilhermeleobas/220/head 2025-09-07T06:13:36.9967890Z * [new branch] gh/guilhermeleobas/220/orig -> origin/gh/guilhermeleobas/220/orig 2025-09-07T06:13:36.9969475Z * [new branch] gh/guilhermeleobas/221/base -> origin/gh/guilhermeleobas/221/base 2025-09-07T06:13:36.9970572Z * [new branch] gh/guilhermeleobas/221/head -> origin/gh/guilhermeleobas/221/head 2025-09-07T06:13:36.9971697Z * [new branch] gh/guilhermeleobas/221/orig -> origin/gh/guilhermeleobas/221/orig 2025-09-07T06:13:36.9973617Z * [new branch] gh/guilhermeleobas/222/base -> origin/gh/guilhermeleobas/222/base 2025-09-07T06:13:36.9974798Z * [new branch] gh/guilhermeleobas/222/head -> origin/gh/guilhermeleobas/222/head 2025-09-07T06:13:36.9976003Z * [new branch] gh/guilhermeleobas/222/orig -> origin/gh/guilhermeleobas/222/orig 2025-09-07T06:13:36.9977594Z * [new branch] gh/guilhermeleobas/223/base -> origin/gh/guilhermeleobas/223/base 2025-09-07T06:13:36.9978908Z * [new branch] gh/guilhermeleobas/223/head -> origin/gh/guilhermeleobas/223/head 2025-09-07T06:13:36.9980178Z * [new branch] gh/guilhermeleobas/223/orig -> origin/gh/guilhermeleobas/223/orig 2025-09-07T06:13:36.9981771Z * [new branch] gh/guilhermeleobas/224/base -> origin/gh/guilhermeleobas/224/base 2025-09-07T06:13:36.9982931Z * [new branch] gh/guilhermeleobas/224/head -> origin/gh/guilhermeleobas/224/head 2025-09-07T06:13:36.9984093Z * [new branch] gh/guilhermeleobas/224/orig -> origin/gh/guilhermeleobas/224/orig 2025-09-07T06:13:36.9985768Z * [new branch] gh/guilhermeleobas/225/base -> origin/gh/guilhermeleobas/225/base 2025-09-07T06:13:36.9986981Z * [new branch] gh/guilhermeleobas/225/head -> origin/gh/guilhermeleobas/225/head 2025-09-07T06:13:36.9988125Z * [new branch] gh/guilhermeleobas/225/orig -> origin/gh/guilhermeleobas/225/orig 2025-09-07T06:13:36.9989615Z * [new branch] gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base 2025-09-07T06:13:36.9990714Z * [new branch] gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head 2025-09-07T06:13:36.9991816Z * [new branch] gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig 2025-09-07T06:13:36.9996978Z * [new branch] gh/guilhermeleobas/227/base -> origin/gh/guilhermeleobas/227/base 2025-09-07T06:13:36.9998205Z * [new branch] gh/guilhermeleobas/227/head -> origin/gh/guilhermeleobas/227/head 2025-09-07T06:13:36.9999449Z * [new branch] gh/guilhermeleobas/227/orig -> origin/gh/guilhermeleobas/227/orig 2025-09-07T06:13:37.0001248Z * [new branch] gh/guilhermeleobas/228/base -> origin/gh/guilhermeleobas/228/base 2025-09-07T06:13:37.0002449Z * [new branch] gh/guilhermeleobas/228/head -> origin/gh/guilhermeleobas/228/head 2025-09-07T06:13:37.0003526Z * [new branch] gh/guilhermeleobas/228/orig -> origin/gh/guilhermeleobas/228/orig 2025-09-07T06:13:37.0005243Z * [new branch] gh/guilhermeleobas/229/base -> origin/gh/guilhermeleobas/229/base 2025-09-07T06:13:37.0006417Z * [new branch] gh/guilhermeleobas/229/head -> origin/gh/guilhermeleobas/229/head 2025-09-07T06:13:37.0007605Z * [new branch] gh/guilhermeleobas/229/orig -> origin/gh/guilhermeleobas/229/orig 2025-09-07T06:13:37.0009190Z * [new branch] gh/guilhermeleobas/230/base -> origin/gh/guilhermeleobas/230/base 2025-09-07T06:13:37.0010315Z * [new branch] gh/guilhermeleobas/230/head -> origin/gh/guilhermeleobas/230/head 2025-09-07T06:13:37.0011473Z * [new branch] gh/guilhermeleobas/230/orig -> origin/gh/guilhermeleobas/230/orig 2025-09-07T06:13:37.0013398Z * [new branch] gh/guilhermeleobas/231/base -> origin/gh/guilhermeleobas/231/base 2025-09-07T06:13:37.0014526Z * [new branch] gh/guilhermeleobas/231/head -> origin/gh/guilhermeleobas/231/head 2025-09-07T06:13:37.0015670Z * [new branch] gh/guilhermeleobas/231/orig -> origin/gh/guilhermeleobas/231/orig 2025-09-07T06:13:37.0017319Z * [new branch] gh/guilhermeleobas/232/base -> origin/gh/guilhermeleobas/232/base 2025-09-07T06:13:37.0018461Z * [new branch] gh/guilhermeleobas/232/head -> origin/gh/guilhermeleobas/232/head 2025-09-07T06:13:37.0019627Z * [new branch] gh/guilhermeleobas/232/orig -> origin/gh/guilhermeleobas/232/orig 2025-09-07T06:13:37.0021330Z * [new branch] gh/guilhermeleobas/233/base -> origin/gh/guilhermeleobas/233/base 2025-09-07T06:13:37.0022410Z * [new branch] gh/guilhermeleobas/233/head -> origin/gh/guilhermeleobas/233/head 2025-09-07T06:13:37.0023669Z * [new branch] gh/guilhermeleobas/233/orig -> origin/gh/guilhermeleobas/233/orig 2025-09-07T06:13:37.0025380Z * [new branch] gh/guilhermeleobas/234/base -> origin/gh/guilhermeleobas/234/base 2025-09-07T06:13:37.0026545Z * [new branch] gh/guilhermeleobas/234/head -> origin/gh/guilhermeleobas/234/head 2025-09-07T06:13:37.0027693Z * [new branch] gh/guilhermeleobas/234/orig -> origin/gh/guilhermeleobas/234/orig 2025-09-07T06:13:37.0029309Z * [new branch] gh/guilhermeleobas/235/base -> origin/gh/guilhermeleobas/235/base 2025-09-07T06:13:37.0030485Z * [new branch] gh/guilhermeleobas/235/head -> origin/gh/guilhermeleobas/235/head 2025-09-07T06:13:37.0031701Z * [new branch] gh/guilhermeleobas/235/orig -> origin/gh/guilhermeleobas/235/orig 2025-09-07T06:13:37.0033207Z * [new branch] gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base 2025-09-07T06:13:37.0034322Z * [new branch] gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head 2025-09-07T06:13:37.0035476Z * [new branch] gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig 2025-09-07T06:13:37.0037062Z * [new branch] gh/guilhermeleobas/237/base -> origin/gh/guilhermeleobas/237/base 2025-09-07T06:13:37.0038159Z * [new branch] gh/guilhermeleobas/237/head -> origin/gh/guilhermeleobas/237/head 2025-09-07T06:13:37.0039303Z * [new branch] gh/guilhermeleobas/237/orig -> origin/gh/guilhermeleobas/237/orig 2025-09-07T06:13:37.0040831Z * [new branch] gh/guilhermeleobas/238/base -> origin/gh/guilhermeleobas/238/base 2025-09-07T06:13:37.0041963Z * [new branch] gh/guilhermeleobas/238/head -> origin/gh/guilhermeleobas/238/head 2025-09-07T06:13:37.0043090Z * [new branch] gh/guilhermeleobas/238/orig -> origin/gh/guilhermeleobas/238/orig 2025-09-07T06:13:37.0044705Z * [new branch] gh/guilhermeleobas/239/base -> origin/gh/guilhermeleobas/239/base 2025-09-07T06:13:37.0045831Z * [new branch] gh/guilhermeleobas/239/head -> origin/gh/guilhermeleobas/239/head 2025-09-07T06:13:37.0046972Z * [new branch] gh/guilhermeleobas/239/orig -> origin/gh/guilhermeleobas/239/orig 2025-09-07T06:13:37.0048596Z * [new branch] gh/guilhermeleobas/240/base -> origin/gh/guilhermeleobas/240/base 2025-09-07T06:13:37.0049720Z * [new branch] gh/guilhermeleobas/240/head -> origin/gh/guilhermeleobas/240/head 2025-09-07T06:13:37.0050812Z * [new branch] gh/guilhermeleobas/240/orig -> origin/gh/guilhermeleobas/240/orig 2025-09-07T06:13:37.0052360Z * [new branch] gh/guilhermeleobas/241/base -> origin/gh/guilhermeleobas/241/base 2025-09-07T06:13:37.0053972Z * [new branch] gh/guilhermeleobas/241/head -> origin/gh/guilhermeleobas/241/head 2025-09-07T06:13:37.0055097Z * [new branch] gh/guilhermeleobas/241/orig -> origin/gh/guilhermeleobas/241/orig 2025-09-07T06:13:37.0056898Z * [new branch] gh/guilhermeleobas/242/base -> origin/gh/guilhermeleobas/242/base 2025-09-07T06:13:37.0058014Z * [new branch] gh/guilhermeleobas/242/head -> origin/gh/guilhermeleobas/242/head 2025-09-07T06:13:37.0059212Z * [new branch] gh/guilhermeleobas/242/orig -> origin/gh/guilhermeleobas/242/orig 2025-09-07T06:13:37.0060749Z * [new branch] gh/guilhermeleobas/243/base -> origin/gh/guilhermeleobas/243/base 2025-09-07T06:13:37.0061932Z * [new branch] gh/guilhermeleobas/243/head -> origin/gh/guilhermeleobas/243/head 2025-09-07T06:13:37.0063086Z * [new branch] gh/guilhermeleobas/243/orig -> origin/gh/guilhermeleobas/243/orig 2025-09-07T06:13:37.0064679Z * [new branch] gh/guilhermeleobas/244/base -> origin/gh/guilhermeleobas/244/base 2025-09-07T06:13:37.0065926Z * [new branch] gh/guilhermeleobas/244/head -> origin/gh/guilhermeleobas/244/head 2025-09-07T06:13:37.0067064Z * [new branch] gh/guilhermeleobas/244/orig -> origin/gh/guilhermeleobas/244/orig 2025-09-07T06:13:37.0068771Z * [new branch] gh/guilhermeleobas/245/base -> origin/gh/guilhermeleobas/245/base 2025-09-07T06:13:37.0069900Z * [new branch] gh/guilhermeleobas/245/head -> origin/gh/guilhermeleobas/245/head 2025-09-07T06:13:37.0071026Z * [new branch] gh/guilhermeleobas/245/orig -> origin/gh/guilhermeleobas/245/orig 2025-09-07T06:13:37.0072652Z * [new branch] gh/guilhermeleobas/73/base -> origin/gh/guilhermeleobas/73/base 2025-09-07T06:13:37.0073756Z * [new branch] gh/guilhermeleobas/73/head -> origin/gh/guilhermeleobas/73/head 2025-09-07T06:13:37.0074872Z * [new branch] gh/guilhermeleobas/73/orig -> origin/gh/guilhermeleobas/73/orig 2025-09-07T06:13:37.0076771Z * [new branch] gh/henrylhtsang/140/base -> origin/gh/henrylhtsang/140/base 2025-09-07T06:13:37.0078022Z * [new branch] gh/henrylhtsang/140/head -> origin/gh/henrylhtsang/140/head 2025-09-07T06:13:37.0079125Z * [new branch] gh/henrylhtsang/140/orig -> origin/gh/henrylhtsang/140/orig 2025-09-07T06:13:37.0080585Z * [new branch] gh/henrylhtsang/141/base -> origin/gh/henrylhtsang/141/base 2025-09-07T06:13:37.0081707Z * [new branch] gh/henrylhtsang/141/head -> origin/gh/henrylhtsang/141/head 2025-09-07T06:13:37.0082885Z * [new branch] gh/henrylhtsang/141/orig -> origin/gh/henrylhtsang/141/orig 2025-09-07T06:13:37.0084690Z * [new branch] gh/henrylhtsang/142/base -> origin/gh/henrylhtsang/142/base 2025-09-07T06:13:37.0085980Z * [new branch] gh/henrylhtsang/142/head -> origin/gh/henrylhtsang/142/head 2025-09-07T06:13:37.0087163Z * [new branch] gh/henrylhtsang/142/orig -> origin/gh/henrylhtsang/142/orig 2025-09-07T06:13:37.0088720Z * [new branch] gh/henrylhtsang/143/base -> origin/gh/henrylhtsang/143/base 2025-09-07T06:13:37.0089858Z * [new branch] gh/henrylhtsang/143/head -> origin/gh/henrylhtsang/143/head 2025-09-07T06:13:37.0090988Z * [new branch] gh/henrylhtsang/143/orig -> origin/gh/henrylhtsang/143/orig 2025-09-07T06:13:37.0093034Z * [new branch] gh/henrylhtsang/144/base -> origin/gh/henrylhtsang/144/base 2025-09-07T06:13:37.0094320Z * [new branch] gh/henrylhtsang/144/head -> origin/gh/henrylhtsang/144/head 2025-09-07T06:13:37.0095530Z * [new branch] gh/henrylhtsang/144/orig -> origin/gh/henrylhtsang/144/orig 2025-09-07T06:13:37.0097247Z * [new branch] gh/henrylhtsang/145/base -> origin/gh/henrylhtsang/145/base 2025-09-07T06:13:37.0098447Z * [new branch] gh/henrylhtsang/145/head -> origin/gh/henrylhtsang/145/head 2025-09-07T06:13:37.0099670Z * [new branch] gh/henrylhtsang/145/orig -> origin/gh/henrylhtsang/145/orig 2025-09-07T06:13:37.0101295Z * [new branch] gh/henrylhtsang/146/base -> origin/gh/henrylhtsang/146/base 2025-09-07T06:13:37.0102637Z * [new branch] gh/henrylhtsang/146/head -> origin/gh/henrylhtsang/146/head 2025-09-07T06:13:37.0103687Z * [new branch] gh/henrylhtsang/146/orig -> origin/gh/henrylhtsang/146/orig 2025-09-07T06:13:37.0105431Z * [new branch] gh/henrylhtsang/147/base -> origin/gh/henrylhtsang/147/base 2025-09-07T06:13:37.0106566Z * [new branch] gh/henrylhtsang/147/head -> origin/gh/henrylhtsang/147/head 2025-09-07T06:13:37.0107659Z * [new branch] gh/henrylhtsang/147/orig -> origin/gh/henrylhtsang/147/orig 2025-09-07T06:13:37.0109449Z * [new branch] gh/henrylhtsang/148/base -> origin/gh/henrylhtsang/148/base 2025-09-07T06:13:37.0110777Z * [new branch] gh/henrylhtsang/148/head -> origin/gh/henrylhtsang/148/head 2025-09-07T06:13:37.0111961Z * [new branch] gh/henrylhtsang/148/orig -> origin/gh/henrylhtsang/148/orig 2025-09-07T06:13:37.0113517Z * [new branch] gh/henrylhtsang/149/base -> origin/gh/henrylhtsang/149/base 2025-09-07T06:13:37.0114718Z * [new branch] gh/henrylhtsang/149/head -> origin/gh/henrylhtsang/149/head 2025-09-07T06:13:37.0115870Z * [new branch] gh/henrylhtsang/149/orig -> origin/gh/henrylhtsang/149/orig 2025-09-07T06:13:37.0117703Z * [new branch] gh/huydhn/1/next -> origin/gh/huydhn/1/next 2025-09-07T06:13:37.0119152Z * [new branch] gh/huydhn/2/next -> origin/gh/huydhn/2/next 2025-09-07T06:13:37.0120620Z * [new branch] gh/huydhn/3/next -> origin/gh/huydhn/3/next 2025-09-07T06:13:37.0122616Z * [new branch] gh/huydhn/4/next -> origin/gh/huydhn/4/next 2025-09-07T06:13:37.0124125Z * [new branch] gh/huydhn/5/next -> origin/gh/huydhn/5/next 2025-09-07T06:13:37.0125625Z * [new branch] gh/huydhn/6/next -> origin/gh/huydhn/6/next 2025-09-07T06:13:37.0127551Z * [new branch] gh/int3/97/base -> origin/gh/int3/97/base 2025-09-07T06:13:37.0128695Z * [new branch] gh/int3/97/head -> origin/gh/int3/97/head 2025-09-07T06:13:37.0130594Z * [new branch] gh/isuruf/101/base -> origin/gh/isuruf/101/base 2025-09-07T06:13:37.0131718Z * [new branch] gh/isuruf/101/head -> origin/gh/isuruf/101/head 2025-09-07T06:13:37.0133610Z * [new branch] gh/isuruf/141/base -> origin/gh/isuruf/141/base 2025-09-07T06:13:37.0134831Z * [new branch] gh/isuruf/141/head -> origin/gh/isuruf/141/head 2025-09-07T06:13:37.0135987Z * [new branch] gh/isuruf/141/orig -> origin/gh/isuruf/141/orig 2025-09-07T06:13:37.0137539Z * [new branch] gh/isuruf/142/base -> origin/gh/isuruf/142/base 2025-09-07T06:13:37.0138720Z * [new branch] gh/isuruf/142/head -> origin/gh/isuruf/142/head 2025-09-07T06:13:37.0140034Z * [new branch] gh/isuruf/142/orig -> origin/gh/isuruf/142/orig 2025-09-07T06:13:37.0141611Z * [new branch] gh/isuruf/143/base -> origin/gh/isuruf/143/base 2025-09-07T06:13:37.0142755Z * [new branch] gh/isuruf/143/head -> origin/gh/isuruf/143/head 2025-09-07T06:13:37.0143925Z * [new branch] gh/isuruf/143/orig -> origin/gh/isuruf/143/orig 2025-09-07T06:13:37.0145582Z * [new branch] gh/isuruf/144/base -> origin/gh/isuruf/144/base 2025-09-07T06:13:37.0146676Z * [new branch] gh/isuruf/144/head -> origin/gh/isuruf/144/head 2025-09-07T06:13:37.0147814Z * [new branch] gh/isuruf/144/orig -> origin/gh/isuruf/144/orig 2025-09-07T06:13:37.0149577Z * [new branch] gh/isuruf/145/base -> origin/gh/isuruf/145/base 2025-09-07T06:13:37.0150668Z * [new branch] gh/isuruf/145/head -> origin/gh/isuruf/145/head 2025-09-07T06:13:37.0151888Z * [new branch] gh/isuruf/145/orig -> origin/gh/isuruf/145/orig 2025-09-07T06:13:37.0153328Z * [new branch] gh/isuruf/146/base -> origin/gh/isuruf/146/base 2025-09-07T06:13:37.0154492Z * [new branch] gh/isuruf/146/head -> origin/gh/isuruf/146/head 2025-09-07T06:13:37.0155601Z * [new branch] gh/isuruf/146/orig -> origin/gh/isuruf/146/orig 2025-09-07T06:13:37.0157107Z * [new branch] gh/isuruf/81/base -> origin/gh/isuruf/81/base 2025-09-07T06:13:37.0158231Z * [new branch] gh/isuruf/81/head -> origin/gh/isuruf/81/head 2025-09-07T06:13:37.0159386Z * [new branch] gh/isuruf/81/orig -> origin/gh/isuruf/81/orig 2025-09-07T06:13:37.0161177Z * [new branch] gh/jamesjwu/150/base -> origin/gh/jamesjwu/150/base 2025-09-07T06:13:37.0162303Z * [new branch] gh/jamesjwu/150/head -> origin/gh/jamesjwu/150/head 2025-09-07T06:13:37.0163480Z * [new branch] gh/jamesjwu/150/orig -> origin/gh/jamesjwu/150/orig 2025-09-07T06:13:37.0165162Z * [new branch] gh/jamesjwu/154/base -> origin/gh/jamesjwu/154/base 2025-09-07T06:13:37.0166216Z * [new branch] gh/jamesjwu/154/head -> origin/gh/jamesjwu/154/head 2025-09-07T06:13:37.0167320Z * [new branch] gh/jamesjwu/154/orig -> origin/gh/jamesjwu/154/orig 2025-09-07T06:13:37.0168837Z * [new branch] gh/jamesjwu/155/base -> origin/gh/jamesjwu/155/base 2025-09-07T06:13:37.0169991Z * [new branch] gh/jamesjwu/155/head -> origin/gh/jamesjwu/155/head 2025-09-07T06:13:37.0171082Z * [new branch] gh/jamesjwu/155/orig -> origin/gh/jamesjwu/155/orig 2025-09-07T06:13:37.0172642Z * [new branch] gh/jamesjwu/159/base -> origin/gh/jamesjwu/159/base 2025-09-07T06:13:37.0174093Z * [new branch] gh/jamesjwu/159/head -> origin/gh/jamesjwu/159/head 2025-09-07T06:13:37.0175311Z * [new branch] gh/jamesjwu/159/orig -> origin/gh/jamesjwu/159/orig 2025-09-07T06:13:37.0177321Z * [new branch] gh/jamesjwu/163/base -> origin/gh/jamesjwu/163/base 2025-09-07T06:13:37.0178520Z * [new branch] gh/jamesjwu/163/head -> origin/gh/jamesjwu/163/head 2025-09-07T06:13:37.0179691Z * [new branch] gh/jamesjwu/163/orig -> origin/gh/jamesjwu/163/orig 2025-09-07T06:13:37.0181239Z * [new branch] gh/jamesjwu/171/base -> origin/gh/jamesjwu/171/base 2025-09-07T06:13:37.0182396Z * [new branch] gh/jamesjwu/171/head -> origin/gh/jamesjwu/171/head 2025-09-07T06:13:37.0183524Z * [new branch] gh/jamesjwu/171/orig -> origin/gh/jamesjwu/171/orig 2025-09-07T06:13:37.0185233Z * [new branch] gh/jamesjwu/176/base -> origin/gh/jamesjwu/176/base 2025-09-07T06:13:37.0186379Z * [new branch] gh/jamesjwu/176/head -> origin/gh/jamesjwu/176/head 2025-09-07T06:13:37.0187491Z * [new branch] gh/jamesjwu/176/orig -> origin/gh/jamesjwu/176/orig 2025-09-07T06:13:37.0189675Z * [new branch] gh/jamesjwu/181/base -> origin/gh/jamesjwu/181/base 2025-09-07T06:13:37.0190855Z * [new branch] gh/jamesjwu/181/head -> origin/gh/jamesjwu/181/head 2025-09-07T06:13:37.0192075Z * [new branch] gh/jamesjwu/181/orig -> origin/gh/jamesjwu/181/orig 2025-09-07T06:13:37.0193986Z * [new branch] gh/jamesjwu/182/base -> origin/gh/jamesjwu/182/base 2025-09-07T06:13:37.0195163Z * [new branch] gh/jamesjwu/182/head -> origin/gh/jamesjwu/182/head 2025-09-07T06:13:37.0196349Z * [new branch] gh/jamesjwu/182/orig -> origin/gh/jamesjwu/182/orig 2025-09-07T06:13:37.0197888Z * [new branch] gh/jamesjwu/183/base -> origin/gh/jamesjwu/183/base 2025-09-07T06:13:37.0199194Z * [new branch] gh/jamesjwu/183/head -> origin/gh/jamesjwu/183/head 2025-09-07T06:13:37.0200259Z * [new branch] gh/jamesjwu/183/orig -> origin/gh/jamesjwu/183/orig 2025-09-07T06:13:37.0201883Z * [new branch] gh/jamesjwu/184/base -> origin/gh/jamesjwu/184/base 2025-09-07T06:13:37.0203005Z * [new branch] gh/jamesjwu/184/head -> origin/gh/jamesjwu/184/head 2025-09-07T06:13:37.0204353Z * [new branch] gh/jamesjwu/184/orig -> origin/gh/jamesjwu/184/orig 2025-09-07T06:13:37.0205878Z * [new branch] gh/jamesjwu/185/base -> origin/gh/jamesjwu/185/base 2025-09-07T06:13:37.0207174Z * [new branch] gh/jamesjwu/185/head -> origin/gh/jamesjwu/185/head 2025-09-07T06:13:37.0208311Z * [new branch] gh/jamesjwu/185/orig -> origin/gh/jamesjwu/185/orig 2025-09-07T06:13:37.0210022Z * [new branch] gh/jamesjwu/186/base -> origin/gh/jamesjwu/186/base 2025-09-07T06:13:37.0210839Z * [new branch] gh/jamesjwu/186/head -> origin/gh/jamesjwu/186/head 2025-09-07T06:13:37.0212022Z * [new branch] gh/jamesjwu/186/orig -> origin/gh/jamesjwu/186/orig 2025-09-07T06:13:37.0213970Z * [new branch] gh/jamesjwu/187/base -> origin/gh/jamesjwu/187/base 2025-09-07T06:13:37.0215087Z * [new branch] gh/jamesjwu/187/head -> origin/gh/jamesjwu/187/head 2025-09-07T06:13:37.0216308Z * [new branch] gh/jamesjwu/187/orig -> origin/gh/jamesjwu/187/orig 2025-09-07T06:13:37.0217889Z * [new branch] gh/jamesjwu/188/base -> origin/gh/jamesjwu/188/base 2025-09-07T06:13:37.0219044Z * [new branch] gh/jamesjwu/188/head -> origin/gh/jamesjwu/188/head 2025-09-07T06:13:37.0220199Z * [new branch] gh/jamesjwu/188/orig -> origin/gh/jamesjwu/188/orig 2025-09-07T06:13:37.0221679Z * [new branch] gh/jamesjwu/189/base -> origin/gh/jamesjwu/189/base 2025-09-07T06:13:37.0222905Z * [new branch] gh/jamesjwu/189/head -> origin/gh/jamesjwu/189/head 2025-09-07T06:13:37.0224054Z * [new branch] gh/jamesjwu/189/orig -> origin/gh/jamesjwu/189/orig 2025-09-07T06:13:37.0226145Z * [new branch] gh/jamesjwu/190/base -> origin/gh/jamesjwu/190/base 2025-09-07T06:13:37.0227286Z * [new branch] gh/jamesjwu/190/head -> origin/gh/jamesjwu/190/head 2025-09-07T06:13:37.0228418Z * [new branch] gh/jamesjwu/190/orig -> origin/gh/jamesjwu/190/orig 2025-09-07T06:13:37.0230076Z * [new branch] gh/jamesjwu/52/base -> origin/gh/jamesjwu/52/base 2025-09-07T06:13:37.0231227Z * [new branch] gh/jamesjwu/52/head -> origin/gh/jamesjwu/52/head 2025-09-07T06:13:37.0232689Z * [new branch] gh/jamesjwu/53/base -> origin/gh/jamesjwu/53/base 2025-09-07T06:13:37.0233777Z * [new branch] gh/jamesjwu/53/head -> origin/gh/jamesjwu/53/head 2025-09-07T06:13:37.0235192Z * [new branch] gh/jamesjwu/54/base -> origin/gh/jamesjwu/54/base 2025-09-07T06:13:37.0236281Z * [new branch] gh/jamesjwu/54/head -> origin/gh/jamesjwu/54/head 2025-09-07T06:13:37.0237727Z * [new branch] gh/jamesjwu/55/base -> origin/gh/jamesjwu/55/base 2025-09-07T06:13:37.0238810Z * [new branch] gh/jamesjwu/55/head -> origin/gh/jamesjwu/55/head 2025-09-07T06:13:37.0240240Z * [new branch] gh/jamesjwu/56/base -> origin/gh/jamesjwu/56/base 2025-09-07T06:13:37.0241334Z * [new branch] gh/jamesjwu/56/head -> origin/gh/jamesjwu/56/head 2025-09-07T06:13:37.0242789Z * [new branch] gh/jamesjwu/57/base -> origin/gh/jamesjwu/57/base 2025-09-07T06:13:37.0243821Z * [new branch] gh/jamesjwu/57/head -> origin/gh/jamesjwu/57/head 2025-09-07T06:13:37.0245307Z * [new branch] gh/jamesjwu/58/base -> origin/gh/jamesjwu/58/base 2025-09-07T06:13:37.0246348Z * [new branch] gh/jamesjwu/58/head -> origin/gh/jamesjwu/58/head 2025-09-07T06:13:37.0247753Z * [new branch] gh/jamesjwu/59/base -> origin/gh/jamesjwu/59/base 2025-09-07T06:13:37.0248815Z * [new branch] gh/jamesjwu/59/head -> origin/gh/jamesjwu/59/head 2025-09-07T06:13:37.0250244Z * [new branch] gh/jamesjwu/60/base -> origin/gh/jamesjwu/60/base 2025-09-07T06:13:37.0251309Z * [new branch] gh/jamesjwu/60/head -> origin/gh/jamesjwu/60/head 2025-09-07T06:13:37.0252798Z * [new branch] gh/jamesjwu/61/base -> origin/gh/jamesjwu/61/base 2025-09-07T06:13:37.0254402Z * [new branch] gh/jamesjwu/61/head -> origin/gh/jamesjwu/61/head 2025-09-07T06:13:37.0255860Z * [new branch] gh/jamesjwu/62/base -> origin/gh/jamesjwu/62/base 2025-09-07T06:13:37.0256980Z * [new branch] gh/jamesjwu/62/head -> origin/gh/jamesjwu/62/head 2025-09-07T06:13:37.0258449Z * [new branch] gh/jamesjwu/63/base -> origin/gh/jamesjwu/63/base 2025-09-07T06:13:37.0259656Z * [new branch] gh/jamesjwu/63/head -> origin/gh/jamesjwu/63/head 2025-09-07T06:13:37.0261394Z * [new branch] gh/jamesjwu/64/base -> origin/gh/jamesjwu/64/base 2025-09-07T06:13:37.0262538Z * [new branch] gh/jamesjwu/64/head -> origin/gh/jamesjwu/64/head 2025-09-07T06:13:37.0264035Z * [new branch] gh/jamesjwu/65/base -> origin/gh/jamesjwu/65/base 2025-09-07T06:13:37.0265102Z * [new branch] gh/jamesjwu/65/head -> origin/gh/jamesjwu/65/head 2025-09-07T06:13:37.0267177Z * [new branch] gh/janeyx99/165/base -> origin/gh/janeyx99/165/base 2025-09-07T06:13:37.0268382Z * [new branch] gh/janeyx99/165/head -> origin/gh/janeyx99/165/head 2025-09-07T06:13:37.0269564Z * [new branch] gh/janeyx99/165/orig -> origin/gh/janeyx99/165/orig 2025-09-07T06:13:37.0270934Z * [new branch] gh/janeyx99/201/base -> origin/gh/janeyx99/201/base 2025-09-07T06:13:37.0272071Z * [new branch] gh/janeyx99/201/head -> origin/gh/janeyx99/201/head 2025-09-07T06:13:37.0273247Z * [new branch] gh/janeyx99/201/orig -> origin/gh/janeyx99/201/orig 2025-09-07T06:13:37.0275184Z * [new branch] gh/janeyx99/225/base -> origin/gh/janeyx99/225/base 2025-09-07T06:13:37.0276358Z * [new branch] gh/janeyx99/225/head -> origin/gh/janeyx99/225/head 2025-09-07T06:13:37.0277495Z * [new branch] gh/janeyx99/225/orig -> origin/gh/janeyx99/225/orig 2025-09-07T06:13:37.0279131Z * [new branch] gh/janeyx99/296/base -> origin/gh/janeyx99/296/base 2025-09-07T06:13:37.0280261Z * [new branch] gh/janeyx99/296/head -> origin/gh/janeyx99/296/head 2025-09-07T06:13:37.0281381Z * [new branch] gh/janeyx99/296/orig -> origin/gh/janeyx99/296/orig 2025-09-07T06:13:37.0282891Z * [new branch] gh/janeyx99/297/base -> origin/gh/janeyx99/297/base 2025-09-07T06:13:37.0284028Z * [new branch] gh/janeyx99/297/head -> origin/gh/janeyx99/297/head 2025-09-07T06:13:37.0285133Z * [new branch] gh/janeyx99/297/orig -> origin/gh/janeyx99/297/orig 2025-09-07T06:13:37.0286684Z * [new branch] gh/janeyx99/298/base -> origin/gh/janeyx99/298/base 2025-09-07T06:13:37.0287819Z * [new branch] gh/janeyx99/298/head -> origin/gh/janeyx99/298/head 2025-09-07T06:13:37.0288978Z * [new branch] gh/janeyx99/298/orig -> origin/gh/janeyx99/298/orig 2025-09-07T06:13:37.0290527Z * [new branch] gh/janeyx99/299/base -> origin/gh/janeyx99/299/base 2025-09-07T06:13:37.0291712Z * [new branch] gh/janeyx99/299/head -> origin/gh/janeyx99/299/head 2025-09-07T06:13:37.0293392Z * [new branch] gh/janeyx99/299/orig -> origin/gh/janeyx99/299/orig 2025-09-07T06:13:37.0295169Z * [new branch] gh/janeyx99/300/base -> origin/gh/janeyx99/300/base 2025-09-07T06:13:37.0296536Z * [new branch] gh/janeyx99/300/head -> origin/gh/janeyx99/300/head 2025-09-07T06:13:37.0297742Z * [new branch] gh/janeyx99/300/orig -> origin/gh/janeyx99/300/orig 2025-09-07T06:13:37.0299376Z * [new branch] gh/janeyx99/301/base -> origin/gh/janeyx99/301/base 2025-09-07T06:13:37.0300537Z * [new branch] gh/janeyx99/301/head -> origin/gh/janeyx99/301/head 2025-09-07T06:13:37.0301706Z * [new branch] gh/janeyx99/301/orig -> origin/gh/janeyx99/301/orig 2025-09-07T06:13:37.0303152Z * [new branch] gh/janeyx99/302/base -> origin/gh/janeyx99/302/base 2025-09-07T06:13:37.0304501Z * [new branch] gh/janeyx99/302/head -> origin/gh/janeyx99/302/head 2025-09-07T06:13:37.0305897Z * [new branch] gh/janeyx99/303/base -> origin/gh/janeyx99/303/base 2025-09-07T06:13:37.0306962Z * [new branch] gh/janeyx99/303/head -> origin/gh/janeyx99/303/head 2025-09-07T06:13:37.0308599Z * [new branch] gh/janeyx99/88/base -> origin/gh/janeyx99/88/base 2025-09-07T06:13:37.0309771Z * [new branch] gh/janeyx99/88/head -> origin/gh/janeyx99/88/head 2025-09-07T06:13:37.0310876Z * [new branch] gh/janeyx99/88/orig -> origin/gh/janeyx99/88/orig 2025-09-07T06:13:37.0312720Z * [new branch] gh/jansel/360/base -> origin/gh/jansel/360/base 2025-09-07T06:13:37.0313824Z * [new branch] gh/jansel/360/head -> origin/gh/jansel/360/head 2025-09-07T06:13:37.0315384Z * [new branch] gh/jansel/451/base -> origin/gh/jansel/451/base 2025-09-07T06:13:37.0316530Z * [new branch] gh/jansel/451/head -> origin/gh/jansel/451/head 2025-09-07T06:13:37.0317659Z * [new branch] gh/jansel/451/orig -> origin/gh/jansel/451/orig 2025-09-07T06:13:37.0319187Z * [new branch] gh/jansel/462/base -> origin/gh/jansel/462/base 2025-09-07T06:13:37.0320290Z * [new branch] gh/jansel/462/head -> origin/gh/jansel/462/head 2025-09-07T06:13:37.0321444Z * [new branch] gh/jansel/462/orig -> origin/gh/jansel/462/orig 2025-09-07T06:13:37.0322913Z * [new branch] gh/jansel/531/base -> origin/gh/jansel/531/base 2025-09-07T06:13:37.0324022Z * [new branch] gh/jansel/531/head -> origin/gh/jansel/531/head 2025-09-07T06:13:37.0325152Z * [new branch] gh/jansel/531/orig -> origin/gh/jansel/531/orig 2025-09-07T06:13:37.0327130Z * [new branch] gh/jbschlosser/208/head -> origin/gh/jbschlosser/208/head 2025-09-07T06:13:37.0328734Z * [new branch] gh/jbschlosser/247/base -> origin/gh/jbschlosser/247/base 2025-09-07T06:13:37.0329874Z * [new branch] gh/jbschlosser/247/head -> origin/gh/jbschlosser/247/head 2025-09-07T06:13:37.0331001Z * [new branch] gh/jbschlosser/247/orig -> origin/gh/jbschlosser/247/orig 2025-09-07T06:13:37.0332707Z * [new branch] gh/jbschlosser/248/base -> origin/gh/jbschlosser/248/base 2025-09-07T06:13:37.0334259Z * [new branch] gh/jbschlosser/248/head -> origin/gh/jbschlosser/248/head 2025-09-07T06:13:37.0335378Z * [new branch] gh/jbschlosser/248/orig -> origin/gh/jbschlosser/248/orig 2025-09-07T06:13:37.0337107Z * [new branch] gh/jbschlosser/250/base -> origin/gh/jbschlosser/250/base 2025-09-07T06:13:37.0338302Z * [new branch] gh/jbschlosser/250/head -> origin/gh/jbschlosser/250/head 2025-09-07T06:13:37.0339537Z * [new branch] gh/jbschlosser/250/orig -> origin/gh/jbschlosser/250/orig 2025-09-07T06:13:37.0341305Z * [new branch] gh/jiayisunx/59/base -> origin/gh/jiayisunx/59/base 2025-09-07T06:13:37.0342527Z * [new branch] gh/jiayisunx/59/head -> origin/gh/jiayisunx/59/head 2025-09-07T06:13:37.0343845Z * [new branch] gh/jiayisunx/59/orig -> origin/gh/jiayisunx/59/orig 2025-09-07T06:13:37.0345443Z * [new branch] gh/jiayisunx/61/base -> origin/gh/jiayisunx/61/base 2025-09-07T06:13:37.0346555Z * [new branch] gh/jiayisunx/61/head -> origin/gh/jiayisunx/61/head 2025-09-07T06:13:37.0347763Z * [new branch] gh/jiayisunx/61/orig -> origin/gh/jiayisunx/61/orig 2025-09-07T06:13:37.0349281Z * [new branch] gh/jiayisunx/64/base -> origin/gh/jiayisunx/64/base 2025-09-07T06:13:37.0350393Z * [new branch] gh/jiayisunx/64/head -> origin/gh/jiayisunx/64/head 2025-09-07T06:13:37.0351492Z * [new branch] gh/jiayisunx/64/orig -> origin/gh/jiayisunx/64/orig 2025-09-07T06:13:37.0353010Z * [new branch] gh/jiayisunx/65/base -> origin/gh/jiayisunx/65/base 2025-09-07T06:13:37.0354185Z * [new branch] gh/jiayisunx/65/head -> origin/gh/jiayisunx/65/head 2025-09-07T06:13:37.0355300Z * [new branch] gh/jiayisunx/65/orig -> origin/gh/jiayisunx/65/orig 2025-09-07T06:13:37.0356859Z * [new branch] gh/jiayisunx/66/base -> origin/gh/jiayisunx/66/base 2025-09-07T06:13:37.0357997Z * [new branch] gh/jiayisunx/66/head -> origin/gh/jiayisunx/66/head 2025-09-07T06:13:37.0359144Z * [new branch] gh/jiayisunx/66/orig -> origin/gh/jiayisunx/66/orig 2025-09-07T06:13:37.0360683Z * [new branch] gh/jiayisunx/67/base -> origin/gh/jiayisunx/67/base 2025-09-07T06:13:37.0361831Z * [new branch] gh/jiayisunx/67/head -> origin/gh/jiayisunx/67/head 2025-09-07T06:13:37.0362968Z * [new branch] gh/jiayisunx/67/orig -> origin/gh/jiayisunx/67/orig 2025-09-07T06:13:37.0364966Z * [new branch] gh/jiayisunx/68/base -> origin/gh/jiayisunx/68/base 2025-09-07T06:13:37.0366057Z * [new branch] gh/jiayisunx/68/head -> origin/gh/jiayisunx/68/head 2025-09-07T06:13:37.0367218Z * [new branch] gh/jiayisunx/68/orig -> origin/gh/jiayisunx/68/orig 2025-09-07T06:13:37.0368764Z * [new branch] gh/jiayisunx/69/base -> origin/gh/jiayisunx/69/base 2025-09-07T06:13:37.0369882Z * [new branch] gh/jiayisunx/69/head -> origin/gh/jiayisunx/69/head 2025-09-07T06:13:37.0371035Z * [new branch] gh/jiayisunx/69/orig -> origin/gh/jiayisunx/69/orig 2025-09-07T06:13:37.0372685Z * [new branch] gh/jiayisunx/70/base -> origin/gh/jiayisunx/70/base 2025-09-07T06:13:37.0374130Z * [new branch] gh/jiayisunx/70/head -> origin/gh/jiayisunx/70/head 2025-09-07T06:13:37.0375264Z * [new branch] gh/jiayisunx/70/orig -> origin/gh/jiayisunx/70/orig 2025-09-07T06:13:37.0376820Z * [new branch] gh/jiayisunx/71/base -> origin/gh/jiayisunx/71/base 2025-09-07T06:13:37.0378001Z * [new branch] gh/jiayisunx/71/head -> origin/gh/jiayisunx/71/head 2025-09-07T06:13:37.0379637Z * [new branch] gh/jiayisunx/71/orig -> origin/gh/jiayisunx/71/orig 2025-09-07T06:13:37.0381250Z * [new branch] gh/jiayisunx/72/base -> origin/gh/jiayisunx/72/base 2025-09-07T06:13:37.0382420Z * [new branch] gh/jiayisunx/72/head -> origin/gh/jiayisunx/72/head 2025-09-07T06:13:37.0383565Z * [new branch] gh/jiayisunx/72/orig -> origin/gh/jiayisunx/72/orig 2025-09-07T06:13:37.0385265Z * [new branch] gh/jiayisunx/73/base -> origin/gh/jiayisunx/73/base 2025-09-07T06:13:37.0386547Z * [new branch] gh/jiayisunx/73/head -> origin/gh/jiayisunx/73/head 2025-09-07T06:13:37.0387645Z * [new branch] gh/jiayisunx/73/orig -> origin/gh/jiayisunx/73/orig 2025-09-07T06:13:37.0389102Z * [new branch] gh/jiayisunx/74/base -> origin/gh/jiayisunx/74/base 2025-09-07T06:13:37.0390268Z * [new branch] gh/jiayisunx/74/head -> origin/gh/jiayisunx/74/head 2025-09-07T06:13:37.0391406Z * [new branch] gh/jiayisunx/74/orig -> origin/gh/jiayisunx/74/orig 2025-09-07T06:13:37.0393432Z * [new branch] gh/jiayisunx/75/base -> origin/gh/jiayisunx/75/base 2025-09-07T06:13:37.0394629Z * [new branch] gh/jiayisunx/75/head -> origin/gh/jiayisunx/75/head 2025-09-07T06:13:37.0395627Z * [new branch] gh/jiayisunx/75/orig -> origin/gh/jiayisunx/75/orig 2025-09-07T06:13:37.0397124Z * [new branch] gh/jiayisunx/76/base -> origin/gh/jiayisunx/76/base 2025-09-07T06:13:37.0398219Z * [new branch] gh/jiayisunx/76/head -> origin/gh/jiayisunx/76/head 2025-09-07T06:13:37.0399402Z * [new branch] gh/jiayisunx/76/orig -> origin/gh/jiayisunx/76/orig 2025-09-07T06:13:37.0401407Z * [new branch] gh/jjwu@meta.com/1/base -> origin/gh/jjwu@meta.com/1/base 2025-09-07T06:13:37.0402496Z * [new branch] gh/jjwu@meta.com/1/head -> origin/gh/jjwu@meta.com/1/head 2025-09-07T06:13:37.0404472Z * [new branch] gh/justinchuby/111/base -> origin/gh/justinchuby/111/base 2025-09-07T06:13:37.0405711Z * [new branch] gh/justinchuby/111/head -> origin/gh/justinchuby/111/head 2025-09-07T06:13:37.0406954Z * [new branch] gh/justinchuby/111/orig -> origin/gh/justinchuby/111/orig 2025-09-07T06:13:37.0408525Z * [new branch] gh/justinchuby/112/base -> origin/gh/justinchuby/112/base 2025-09-07T06:13:37.0409596Z * [new branch] gh/justinchuby/112/head -> origin/gh/justinchuby/112/head 2025-09-07T06:13:37.0410774Z * [new branch] gh/justinchuby/112/orig -> origin/gh/justinchuby/112/orig 2025-09-07T06:13:37.0412439Z * [new branch] gh/justinchuby/113/base -> origin/gh/justinchuby/113/base 2025-09-07T06:13:37.0414002Z * [new branch] gh/justinchuby/113/head -> origin/gh/justinchuby/113/head 2025-09-07T06:13:37.0415307Z * [new branch] gh/justinchuby/113/orig -> origin/gh/justinchuby/113/orig 2025-09-07T06:13:37.0416793Z * [new branch] gh/justinchuby/114/base -> origin/gh/justinchuby/114/base 2025-09-07T06:13:37.0418004Z * [new branch] gh/justinchuby/114/head -> origin/gh/justinchuby/114/head 2025-09-07T06:13:37.0419164Z * [new branch] gh/justinchuby/114/orig -> origin/gh/justinchuby/114/orig 2025-09-07T06:13:37.0420735Z * [new branch] gh/justinchuby/115/base -> origin/gh/justinchuby/115/base 2025-09-07T06:13:37.0421895Z * [new branch] gh/justinchuby/115/head -> origin/gh/justinchuby/115/head 2025-09-07T06:13:37.0422997Z * [new branch] gh/justinchuby/115/orig -> origin/gh/justinchuby/115/orig 2025-09-07T06:13:37.0424856Z * [new branch] gh/karthickai/1/base -> origin/gh/karthickai/1/base 2025-09-07T06:13:37.0426124Z * [new branch] gh/karthickai/1/head -> origin/gh/karthickai/1/head 2025-09-07T06:13:37.0427282Z * [new branch] gh/karthickai/1/orig -> origin/gh/karthickai/1/orig 2025-09-07T06:13:37.0428792Z * [new branch] gh/karthickai/2/base -> origin/gh/karthickai/2/base 2025-09-07T06:13:37.0429912Z * [new branch] gh/karthickai/2/head -> origin/gh/karthickai/2/head 2025-09-07T06:13:37.0431014Z * [new branch] gh/karthickai/2/orig -> origin/gh/karthickai/2/orig 2025-09-07T06:13:37.0432888Z * [new branch] gh/kurtamohler/32/base -> origin/gh/kurtamohler/32/base 2025-09-07T06:13:37.0434139Z * [new branch] gh/kurtamohler/32/head -> origin/gh/kurtamohler/32/head 2025-09-07T06:13:37.0435145Z * [new branch] gh/kurtamohler/32/orig -> origin/gh/kurtamohler/32/orig 2025-09-07T06:13:37.0436699Z * [new branch] gh/kurtamohler/33/base -> origin/gh/kurtamohler/33/base 2025-09-07T06:13:37.0437803Z * [new branch] gh/kurtamohler/33/head -> origin/gh/kurtamohler/33/head 2025-09-07T06:13:37.0438925Z * [new branch] gh/kurtamohler/33/orig -> origin/gh/kurtamohler/33/orig 2025-09-07T06:13:37.0440502Z * [new branch] gh/kurtamohler/34/base -> origin/gh/kurtamohler/34/base 2025-09-07T06:13:37.0441604Z * [new branch] gh/kurtamohler/34/head -> origin/gh/kurtamohler/34/head 2025-09-07T06:13:37.0442730Z * [new branch] gh/kurtamohler/34/orig -> origin/gh/kurtamohler/34/orig 2025-09-07T06:13:37.0444276Z * [new branch] gh/kurtamohler/41/base -> origin/gh/kurtamohler/41/base 2025-09-07T06:13:37.0445398Z * [new branch] gh/kurtamohler/41/head -> origin/gh/kurtamohler/41/head 2025-09-07T06:13:37.0446539Z * [new branch] gh/kurtamohler/41/orig -> origin/gh/kurtamohler/41/orig 2025-09-07T06:13:37.0448010Z * [new branch] gh/kurtamohler/46/base -> origin/gh/kurtamohler/46/base 2025-09-07T06:13:37.0449103Z * [new branch] gh/kurtamohler/46/head -> origin/gh/kurtamohler/46/head 2025-09-07T06:13:37.0450298Z * [new branch] gh/kurtamohler/46/orig -> origin/gh/kurtamohler/46/orig 2025-09-07T06:13:37.0451845Z * [new branch] gh/kurtamohler/47/base -> origin/gh/kurtamohler/47/base 2025-09-07T06:13:37.0453249Z * [new branch] gh/kurtamohler/47/head -> origin/gh/kurtamohler/47/head 2025-09-07T06:13:37.0454549Z * [new branch] gh/kurtamohler/47/orig -> origin/gh/kurtamohler/47/orig 2025-09-07T06:13:37.0456122Z * [new branch] gh/kurtamohler/48/base -> origin/gh/kurtamohler/48/base 2025-09-07T06:13:37.0457308Z * [new branch] gh/kurtamohler/48/head -> origin/gh/kurtamohler/48/head 2025-09-07T06:13:37.0458446Z * [new branch] gh/kurtamohler/48/orig -> origin/gh/kurtamohler/48/orig 2025-09-07T06:13:37.0460000Z * [new branch] gh/kurtamohler/49/base -> origin/gh/kurtamohler/49/base 2025-09-07T06:13:37.0461156Z * [new branch] gh/kurtamohler/49/head -> origin/gh/kurtamohler/49/head 2025-09-07T06:13:37.0462272Z * [new branch] gh/kurtamohler/49/orig -> origin/gh/kurtamohler/49/orig 2025-09-07T06:13:37.0463877Z * [new branch] gh/kurtamohler/50/base -> origin/gh/kurtamohler/50/base 2025-09-07T06:13:37.0465128Z * [new branch] gh/kurtamohler/50/head -> origin/gh/kurtamohler/50/head 2025-09-07T06:13:37.0466276Z * [new branch] gh/kurtamohler/50/orig -> origin/gh/kurtamohler/50/orig 2025-09-07T06:13:37.0468268Z * [new branch] gh/kwen2501/130/base -> origin/gh/kwen2501/130/base 2025-09-07T06:13:37.0469607Z * [new branch] gh/kwen2501/130/head -> origin/gh/kwen2501/130/head 2025-09-07T06:13:37.0470773Z * [new branch] gh/kwen2501/130/orig -> origin/gh/kwen2501/130/orig 2025-09-07T06:13:37.0472273Z * [new branch] gh/kwen2501/15/base -> origin/gh/kwen2501/15/base 2025-09-07T06:13:37.0473407Z * [new branch] gh/kwen2501/15/head -> origin/gh/kwen2501/15/head 2025-09-07T06:13:37.0474973Z * [new branch] gh/kwen2501/156/base -> origin/gh/kwen2501/156/base 2025-09-07T06:13:37.0476082Z * [new branch] gh/kwen2501/156/head -> origin/gh/kwen2501/156/head 2025-09-07T06:13:37.0477202Z * [new branch] gh/kwen2501/156/orig -> origin/gh/kwen2501/156/orig 2025-09-07T06:13:37.0478779Z * [new branch] gh/kwen2501/170/base -> origin/gh/kwen2501/170/base 2025-09-07T06:13:37.0479987Z * [new branch] gh/kwen2501/170/head -> origin/gh/kwen2501/170/head 2025-09-07T06:13:37.0481588Z * [new branch] gh/kwen2501/186/base -> origin/gh/kwen2501/186/base 2025-09-07T06:13:37.0482790Z * [new branch] gh/kwen2501/186/head -> origin/gh/kwen2501/186/head 2025-09-07T06:13:37.0483935Z * [new branch] gh/kwen2501/186/orig -> origin/gh/kwen2501/186/orig 2025-09-07T06:13:37.0485867Z * [new branch] gh/kwen2501/187/base -> origin/gh/kwen2501/187/base 2025-09-07T06:13:37.0487063Z * [new branch] gh/kwen2501/187/head -> origin/gh/kwen2501/187/head 2025-09-07T06:13:37.0488243Z * [new branch] gh/kwen2501/187/orig -> origin/gh/kwen2501/187/orig 2025-09-07T06:13:37.0489832Z * [new branch] gh/kwen2501/188/base -> origin/gh/kwen2501/188/base 2025-09-07T06:13:37.0490978Z * [new branch] gh/kwen2501/188/head -> origin/gh/kwen2501/188/head 2025-09-07T06:13:37.0492226Z * [new branch] gh/kwen2501/188/orig -> origin/gh/kwen2501/188/orig 2025-09-07T06:13:37.0496667Z * [new branch] gh/kwen2501/194/base -> origin/gh/kwen2501/194/base 2025-09-07T06:13:37.0497859Z * [new branch] gh/kwen2501/194/head -> origin/gh/kwen2501/194/head 2025-09-07T06:13:37.0499004Z * [new branch] gh/kwen2501/194/orig -> origin/gh/kwen2501/194/orig 2025-09-07T06:13:37.0500610Z * [new branch] gh/kwen2501/199/base -> origin/gh/kwen2501/199/base 2025-09-07T06:13:37.0501808Z * [new branch] gh/kwen2501/199/head -> origin/gh/kwen2501/199/head 2025-09-07T06:13:37.0502966Z * [new branch] gh/kwen2501/199/orig -> origin/gh/kwen2501/199/orig 2025-09-07T06:13:37.0504465Z * [new branch] gh/kwen2501/200/base -> origin/gh/kwen2501/200/base 2025-09-07T06:13:37.0505798Z * [new branch] gh/kwen2501/200/head -> origin/gh/kwen2501/200/head 2025-09-07T06:13:37.0506901Z * [new branch] gh/kwen2501/200/orig -> origin/gh/kwen2501/200/orig 2025-09-07T06:13:37.0508424Z * [new branch] gh/kwen2501/201/base -> origin/gh/kwen2501/201/base 2025-09-07T06:13:37.0509563Z * [new branch] gh/kwen2501/201/head -> origin/gh/kwen2501/201/head 2025-09-07T06:13:37.0510685Z * [new branch] gh/kwen2501/201/orig -> origin/gh/kwen2501/201/orig 2025-09-07T06:13:37.0512228Z * [new branch] gh/kwen2501/203/base -> origin/gh/kwen2501/203/base 2025-09-07T06:13:37.0513330Z * [new branch] gh/kwen2501/203/head -> origin/gh/kwen2501/203/head 2025-09-07T06:13:37.0514478Z * [new branch] gh/kwen2501/203/orig -> origin/gh/kwen2501/203/orig 2025-09-07T06:13:37.0515985Z * [new branch] gh/kwen2501/204/base -> origin/gh/kwen2501/204/base 2025-09-07T06:13:37.0517087Z * [new branch] gh/kwen2501/204/head -> origin/gh/kwen2501/204/head 2025-09-07T06:13:37.0518196Z * [new branch] gh/kwen2501/204/orig -> origin/gh/kwen2501/204/orig 2025-09-07T06:13:37.0519713Z * [new branch] gh/kwen2501/205/base -> origin/gh/kwen2501/205/base 2025-09-07T06:13:37.0520821Z * [new branch] gh/kwen2501/205/head -> origin/gh/kwen2501/205/head 2025-09-07T06:13:37.0521957Z * [new branch] gh/kwen2501/205/orig -> origin/gh/kwen2501/205/orig 2025-09-07T06:13:37.0523493Z * [new branch] gh/kwen2501/206/base -> origin/gh/kwen2501/206/base 2025-09-07T06:13:37.0524562Z * [new branch] gh/kwen2501/206/head -> origin/gh/kwen2501/206/head 2025-09-07T06:13:37.0525682Z * [new branch] gh/kwen2501/206/orig -> origin/gh/kwen2501/206/orig 2025-09-07T06:13:37.0527303Z * [new branch] gh/kwen2501/207/base -> origin/gh/kwen2501/207/base 2025-09-07T06:13:37.0528329Z * [new branch] gh/kwen2501/207/head -> origin/gh/kwen2501/207/head 2025-09-07T06:13:37.0529512Z * [new branch] gh/kwen2501/207/orig -> origin/gh/kwen2501/207/orig 2025-09-07T06:13:37.0531025Z * [new branch] gh/kwen2501/208/base -> origin/gh/kwen2501/208/base 2025-09-07T06:13:37.0532127Z * [new branch] gh/kwen2501/208/head -> origin/gh/kwen2501/208/head 2025-09-07T06:13:37.0533591Z * [new branch] gh/kwen2501/208/orig -> origin/gh/kwen2501/208/orig 2025-09-07T06:13:37.0535604Z * [new branch] gh/kwen2501/209/base -> origin/gh/kwen2501/209/base 2025-09-07T06:13:37.0536795Z * [new branch] gh/kwen2501/209/head -> origin/gh/kwen2501/209/head 2025-09-07T06:13:37.0537963Z * [new branch] gh/kwen2501/209/orig -> origin/gh/kwen2501/209/orig 2025-09-07T06:13:37.0540117Z * [new branch] gh/kwen2501/210/base -> origin/gh/kwen2501/210/base 2025-09-07T06:13:37.0541255Z * [new branch] gh/kwen2501/210/head -> origin/gh/kwen2501/210/head 2025-09-07T06:13:37.0542490Z * [new branch] gh/kwen2501/210/orig -> origin/gh/kwen2501/210/orig 2025-09-07T06:13:37.0544115Z * [new branch] gh/kwen2501/211/base -> origin/gh/kwen2501/211/base 2025-09-07T06:13:37.0545392Z * [new branch] gh/kwen2501/211/head -> origin/gh/kwen2501/211/head 2025-09-07T06:13:37.0546894Z * [new branch] gh/kwen2501/212/base -> origin/gh/kwen2501/212/base 2025-09-07T06:13:37.0548010Z * [new branch] gh/kwen2501/212/head -> origin/gh/kwen2501/212/head 2025-09-07T06:13:37.0549169Z * [new branch] gh/kwen2501/212/orig -> origin/gh/kwen2501/212/orig 2025-09-07T06:13:37.0550844Z * [new branch] gh/kwen2501/213/base -> origin/gh/kwen2501/213/base 2025-09-07T06:13:37.0551973Z * [new branch] gh/kwen2501/213/head -> origin/gh/kwen2501/213/head 2025-09-07T06:13:37.0553084Z * [new branch] gh/kwen2501/213/orig -> origin/gh/kwen2501/213/orig 2025-09-07T06:13:37.0554708Z * [new branch] gh/kwen2501/214/base -> origin/gh/kwen2501/214/base 2025-09-07T06:13:37.0555815Z * [new branch] gh/kwen2501/214/head -> origin/gh/kwen2501/214/head 2025-09-07T06:13:37.0556934Z * [new branch] gh/kwen2501/214/orig -> origin/gh/kwen2501/214/orig 2025-09-07T06:13:37.0558517Z * [new branch] gh/kwen2501/215/base -> origin/gh/kwen2501/215/base 2025-09-07T06:13:37.0559613Z * [new branch] gh/kwen2501/215/head -> origin/gh/kwen2501/215/head 2025-09-07T06:13:37.0560757Z * [new branch] gh/kwen2501/215/orig -> origin/gh/kwen2501/215/orig 2025-09-07T06:13:37.0562228Z * [new branch] gh/kwen2501/216/base -> origin/gh/kwen2501/216/base 2025-09-07T06:13:37.0563364Z * [new branch] gh/kwen2501/216/head -> origin/gh/kwen2501/216/head 2025-09-07T06:13:37.0564502Z * [new branch] gh/kwen2501/216/orig -> origin/gh/kwen2501/216/orig 2025-09-07T06:13:37.0565947Z * [new branch] gh/kwen2501/217/base -> origin/gh/kwen2501/217/base 2025-09-07T06:13:37.0567125Z * [new branch] gh/kwen2501/217/head -> origin/gh/kwen2501/217/head 2025-09-07T06:13:37.0568322Z * [new branch] gh/kwen2501/217/orig -> origin/gh/kwen2501/217/orig 2025-09-07T06:13:37.0569836Z * [new branch] gh/kwen2501/218/base -> origin/gh/kwen2501/218/base 2025-09-07T06:13:37.0570945Z * [new branch] gh/kwen2501/218/head -> origin/gh/kwen2501/218/head 2025-09-07T06:13:37.0572083Z * [new branch] gh/kwen2501/218/orig -> origin/gh/kwen2501/218/orig 2025-09-07T06:13:37.0574049Z * [new branch] gh/kwen2501/219/base -> origin/gh/kwen2501/219/base 2025-09-07T06:13:37.0575100Z * [new branch] gh/kwen2501/219/head -> origin/gh/kwen2501/219/head 2025-09-07T06:13:37.0576255Z * [new branch] gh/kwen2501/219/orig -> origin/gh/kwen2501/219/orig 2025-09-07T06:13:37.0577885Z * [new branch] gh/kwen2501/220/base -> origin/gh/kwen2501/220/base 2025-09-07T06:13:37.0579029Z * [new branch] gh/kwen2501/220/head -> origin/gh/kwen2501/220/head 2025-09-07T06:13:37.0580164Z * [new branch] gh/kwen2501/220/orig -> origin/gh/kwen2501/220/orig 2025-09-07T06:13:37.0581793Z * [new branch] gh/kwen2501/221/base -> origin/gh/kwen2501/221/base 2025-09-07T06:13:37.0582940Z * [new branch] gh/kwen2501/221/head -> origin/gh/kwen2501/221/head 2025-09-07T06:13:37.0584079Z * [new branch] gh/kwen2501/221/orig -> origin/gh/kwen2501/221/orig 2025-09-07T06:13:37.0585790Z * [new branch] gh/kwen2501/222/base -> origin/gh/kwen2501/222/base 2025-09-07T06:13:37.0586915Z * [new branch] gh/kwen2501/222/head -> origin/gh/kwen2501/222/head 2025-09-07T06:13:37.0588017Z * [new branch] gh/kwen2501/222/orig -> origin/gh/kwen2501/222/orig 2025-09-07T06:13:37.0589530Z * [new branch] gh/kwen2501/223/base -> origin/gh/kwen2501/223/base 2025-09-07T06:13:37.0590661Z * [new branch] gh/kwen2501/223/head -> origin/gh/kwen2501/223/head 2025-09-07T06:13:37.0591731Z * [new branch] gh/kwen2501/223/orig -> origin/gh/kwen2501/223/orig 2025-09-07T06:13:37.0593798Z * [new branch] gh/kwen2501/224/base -> origin/gh/kwen2501/224/base 2025-09-07T06:13:37.0594947Z * [new branch] gh/kwen2501/224/head -> origin/gh/kwen2501/224/head 2025-09-07T06:13:37.0596148Z * [new branch] gh/kwen2501/224/orig -> origin/gh/kwen2501/224/orig 2025-09-07T06:13:37.0597778Z * [new branch] gh/kwen2501/225/base -> origin/gh/kwen2501/225/base 2025-09-07T06:13:37.0598882Z * [new branch] gh/kwen2501/225/head -> origin/gh/kwen2501/225/head 2025-09-07T06:13:37.0600019Z * [new branch] gh/kwen2501/225/orig -> origin/gh/kwen2501/225/orig 2025-09-07T06:13:37.0601625Z * [new branch] gh/kwen2501/226/base -> origin/gh/kwen2501/226/base 2025-09-07T06:13:37.0602743Z * [new branch] gh/kwen2501/226/head -> origin/gh/kwen2501/226/head 2025-09-07T06:13:37.0603983Z * [new branch] gh/kwen2501/226/orig -> origin/gh/kwen2501/226/orig 2025-09-07T06:13:37.0605676Z * [new branch] gh/kwen2501/227/base -> origin/gh/kwen2501/227/base 2025-09-07T06:13:37.0606863Z * [new branch] gh/kwen2501/227/head -> origin/gh/kwen2501/227/head 2025-09-07T06:13:37.0607978Z * [new branch] gh/kwen2501/227/orig -> origin/gh/kwen2501/227/orig 2025-09-07T06:13:37.0609594Z * [new branch] gh/kwen2501/228/base -> origin/gh/kwen2501/228/base 2025-09-07T06:13:37.0610635Z * [new branch] gh/kwen2501/228/head -> origin/gh/kwen2501/228/head 2025-09-07T06:13:37.0611775Z * [new branch] gh/kwen2501/228/orig -> origin/gh/kwen2501/228/orig 2025-09-07T06:13:37.0613718Z * [new branch] gh/kwen2501/229/base -> origin/gh/kwen2501/229/base 2025-09-07T06:13:37.0614843Z * [new branch] gh/kwen2501/229/head -> origin/gh/kwen2501/229/head 2025-09-07T06:13:37.0616145Z * [new branch] gh/kwen2501/229/orig -> origin/gh/kwen2501/229/orig 2025-09-07T06:13:37.0617791Z * [new branch] gh/kwen2501/230/base -> origin/gh/kwen2501/230/base 2025-09-07T06:13:37.0618913Z * [new branch] gh/kwen2501/230/head -> origin/gh/kwen2501/230/head 2025-09-07T06:13:37.0620186Z * [new branch] gh/kwen2501/230/orig -> origin/gh/kwen2501/230/orig 2025-09-07T06:13:37.0621675Z * [new branch] gh/kwen2501/231/base -> origin/gh/kwen2501/231/base 2025-09-07T06:13:37.0622863Z * [new branch] gh/kwen2501/231/head -> origin/gh/kwen2501/231/head 2025-09-07T06:13:37.0624025Z * [new branch] gh/kwen2501/231/orig -> origin/gh/kwen2501/231/orig 2025-09-07T06:13:37.0625671Z * [new branch] gh/kwen2501/232/base -> origin/gh/kwen2501/232/base 2025-09-07T06:13:37.0626847Z * [new branch] gh/kwen2501/232/head -> origin/gh/kwen2501/232/head 2025-09-07T06:13:37.0628014Z * [new branch] gh/kwen2501/232/orig -> origin/gh/kwen2501/232/orig 2025-09-07T06:13:37.0629969Z * [new branch] gh/laithsakka/156/base -> origin/gh/laithsakka/156/base 2025-09-07T06:13:37.0631112Z * [new branch] gh/laithsakka/156/head -> origin/gh/laithsakka/156/head 2025-09-07T06:13:37.0632259Z * [new branch] gh/laithsakka/156/orig -> origin/gh/laithsakka/156/orig 2025-09-07T06:13:37.0633943Z * [new branch] gh/laithsakka/160/base -> origin/gh/laithsakka/160/base 2025-09-07T06:13:37.0635033Z * [new branch] gh/laithsakka/160/head -> origin/gh/laithsakka/160/head 2025-09-07T06:13:37.0636141Z * [new branch] gh/laithsakka/160/orig -> origin/gh/laithsakka/160/orig 2025-09-07T06:13:37.0637668Z * [new branch] gh/laithsakka/178/base -> origin/gh/laithsakka/178/base 2025-09-07T06:13:37.0638885Z * [new branch] gh/laithsakka/178/head -> origin/gh/laithsakka/178/head 2025-09-07T06:13:37.0640037Z * [new branch] gh/laithsakka/178/orig -> origin/gh/laithsakka/178/orig 2025-09-07T06:13:37.0641600Z * [new branch] gh/laithsakka/191/base -> origin/gh/laithsakka/191/base 2025-09-07T06:13:37.0642720Z * [new branch] gh/laithsakka/191/head -> origin/gh/laithsakka/191/head 2025-09-07T06:13:37.0643821Z * [new branch] gh/laithsakka/191/orig -> origin/gh/laithsakka/191/orig 2025-09-07T06:13:37.0645309Z * [new branch] gh/laithsakka/237/base -> origin/gh/laithsakka/237/base 2025-09-07T06:13:37.0646484Z * [new branch] gh/laithsakka/237/head -> origin/gh/laithsakka/237/head 2025-09-07T06:13:37.0647619Z * [new branch] gh/laithsakka/237/orig -> origin/gh/laithsakka/237/orig 2025-09-07T06:13:37.0649126Z * [new branch] gh/laithsakka/249/base -> origin/gh/laithsakka/249/base 2025-09-07T06:13:37.0650371Z * [new branch] gh/laithsakka/249/head -> origin/gh/laithsakka/249/head 2025-09-07T06:13:37.0651582Z * [new branch] gh/laithsakka/249/orig -> origin/gh/laithsakka/249/orig 2025-09-07T06:13:37.0653485Z * [new branch] gh/laithsakka/251/base -> origin/gh/laithsakka/251/base 2025-09-07T06:13:37.0654699Z * [new branch] gh/laithsakka/251/head -> origin/gh/laithsakka/251/head 2025-09-07T06:13:37.0655861Z * [new branch] gh/laithsakka/251/orig -> origin/gh/laithsakka/251/orig 2025-09-07T06:13:37.0657535Z * [new branch] gh/laithsakka/254/base -> origin/gh/laithsakka/254/base 2025-09-07T06:13:37.0658626Z * [new branch] gh/laithsakka/254/head -> origin/gh/laithsakka/254/head 2025-09-07T06:13:37.0659773Z * [new branch] gh/laithsakka/254/orig -> origin/gh/laithsakka/254/orig 2025-09-07T06:13:37.0661446Z * [new branch] gh/laithsakka/255/base -> origin/gh/laithsakka/255/base 2025-09-07T06:13:37.0662507Z * [new branch] gh/laithsakka/255/head -> origin/gh/laithsakka/255/head 2025-09-07T06:13:37.0663612Z * [new branch] gh/laithsakka/255/orig -> origin/gh/laithsakka/255/orig 2025-09-07T06:13:37.0665200Z * [new branch] gh/laithsakka/256/base -> origin/gh/laithsakka/256/base 2025-09-07T06:13:37.0666587Z * [new branch] gh/laithsakka/256/head -> origin/gh/laithsakka/256/head 2025-09-07T06:13:37.0667609Z * [new branch] gh/laithsakka/256/orig -> origin/gh/laithsakka/256/orig 2025-09-07T06:13:37.0669170Z * [new branch] gh/laithsakka/257/base -> origin/gh/laithsakka/257/base 2025-09-07T06:13:37.0670253Z * [new branch] gh/laithsakka/257/head -> origin/gh/laithsakka/257/head 2025-09-07T06:13:37.0671406Z * [new branch] gh/laithsakka/257/orig -> origin/gh/laithsakka/257/orig 2025-09-07T06:13:37.0672983Z * [new branch] gh/laithsakka/258/base -> origin/gh/laithsakka/258/base 2025-09-07T06:13:37.0674131Z * [new branch] gh/laithsakka/258/head -> origin/gh/laithsakka/258/head 2025-09-07T06:13:37.0675248Z * [new branch] gh/laithsakka/258/orig -> origin/gh/laithsakka/258/orig 2025-09-07T06:13:37.0676836Z * [new branch] gh/laithsakka/259/base -> origin/gh/laithsakka/259/base 2025-09-07T06:13:37.0677950Z * [new branch] gh/laithsakka/259/head -> origin/gh/laithsakka/259/head 2025-09-07T06:13:37.0679097Z * [new branch] gh/laithsakka/259/orig -> origin/gh/laithsakka/259/orig 2025-09-07T06:13:37.0680528Z * [new branch] gh/laithsakka/260/base -> origin/gh/laithsakka/260/base 2025-09-07T06:13:37.0681645Z * [new branch] gh/laithsakka/260/head -> origin/gh/laithsakka/260/head 2025-09-07T06:13:37.0683052Z * [new branch] gh/laithsakka/260/orig -> origin/gh/laithsakka/260/orig 2025-09-07T06:13:37.0684600Z * [new branch] gh/laithsakka/261/base -> origin/gh/laithsakka/261/base 2025-09-07T06:13:37.0685725Z * [new branch] gh/laithsakka/261/head -> origin/gh/laithsakka/261/head 2025-09-07T06:13:37.0686860Z * [new branch] gh/laithsakka/261/orig -> origin/gh/laithsakka/261/orig 2025-09-07T06:13:37.0688841Z * [new branch] gh/laithsakka/262/base -> origin/gh/laithsakka/262/base 2025-09-07T06:13:37.0690426Z * [new branch] gh/laithsakka/262/head -> origin/gh/laithsakka/262/head 2025-09-07T06:13:37.0691616Z * [new branch] gh/laithsakka/262/orig -> origin/gh/laithsakka/262/orig 2025-09-07T06:13:37.0694280Z * [new branch] gh/laithsakka/263/base -> origin/gh/laithsakka/263/base 2025-09-07T06:13:37.0695478Z * [new branch] gh/laithsakka/263/head -> origin/gh/laithsakka/263/head 2025-09-07T06:13:37.0696562Z * [new branch] gh/laithsakka/263/orig -> origin/gh/laithsakka/263/orig 2025-09-07T06:13:37.0698152Z * [new branch] gh/laithsakka/264/base -> origin/gh/laithsakka/264/base 2025-09-07T06:13:37.0699335Z * [new branch] gh/laithsakka/264/head -> origin/gh/laithsakka/264/head 2025-09-07T06:13:37.0700512Z * [new branch] gh/laithsakka/264/orig -> origin/gh/laithsakka/264/orig 2025-09-07T06:13:37.0702256Z * [new branch] gh/laithsakka/265/base -> origin/gh/laithsakka/265/base 2025-09-07T06:13:37.0703417Z * [new branch] gh/laithsakka/265/head -> origin/gh/laithsakka/265/head 2025-09-07T06:13:37.0704656Z * [new branch] gh/laithsakka/265/orig -> origin/gh/laithsakka/265/orig 2025-09-07T06:13:37.0706233Z * [new branch] gh/laithsakka/266/base -> origin/gh/laithsakka/266/base 2025-09-07T06:13:37.0707381Z * [new branch] gh/laithsakka/266/head -> origin/gh/laithsakka/266/head 2025-09-07T06:13:37.0708470Z * [new branch] gh/laithsakka/266/orig -> origin/gh/laithsakka/266/orig 2025-09-07T06:13:37.0710031Z * [new branch] gh/laithsakka/267/base -> origin/gh/laithsakka/267/base 2025-09-07T06:13:37.0711208Z * [new branch] gh/laithsakka/267/head -> origin/gh/laithsakka/267/head 2025-09-07T06:13:37.0712476Z * [new branch] gh/laithsakka/267/orig -> origin/gh/laithsakka/267/orig 2025-09-07T06:13:37.0713912Z * [new branch] gh/laithsakka/268/base -> origin/gh/laithsakka/268/base 2025-09-07T06:13:37.0715546Z * [new branch] gh/laithsakka/268/head -> origin/gh/laithsakka/268/head 2025-09-07T06:13:37.0716707Z * [new branch] gh/laithsakka/268/orig -> origin/gh/laithsakka/268/orig 2025-09-07T06:13:37.0718402Z * [new branch] gh/laithsakka/28/base -> origin/gh/laithsakka/28/base 2025-09-07T06:13:37.0719802Z * [new branch] gh/laithsakka/29/base -> origin/gh/laithsakka/29/base 2025-09-07T06:13:37.0721261Z * [new branch] gh/laithsakka/30/base -> origin/gh/laithsakka/30/base 2025-09-07T06:13:37.0722429Z * [new branch] gh/laithsakka/30/head -> origin/gh/laithsakka/30/head 2025-09-07T06:13:37.0723820Z * [new branch] gh/laithsakka/31/base -> origin/gh/laithsakka/31/base 2025-09-07T06:13:37.0724879Z * [new branch] gh/laithsakka/31/head -> origin/gh/laithsakka/31/head 2025-09-07T06:13:37.0726329Z * [new branch] gh/laithsakka/32/base -> origin/gh/laithsakka/32/base 2025-09-07T06:13:37.0727397Z * [new branch] gh/laithsakka/32/head -> origin/gh/laithsakka/32/head 2025-09-07T06:13:37.0731277Z * [new branch] gh/lucaskabela/1/base -> origin/gh/lucaskabela/1/base 2025-09-07T06:13:37.0732430Z * [new branch] gh/lucaskabela/1/head -> origin/gh/lucaskabela/1/head 2025-09-07T06:13:37.0734483Z * [new branch] gh/lucaskabela/10/base -> origin/gh/lucaskabela/10/base 2025-09-07T06:13:37.0735690Z * [new branch] gh/lucaskabela/10/head -> origin/gh/lucaskabela/10/head 2025-09-07T06:13:37.0736846Z * [new branch] gh/lucaskabela/10/orig -> origin/gh/lucaskabela/10/orig 2025-09-07T06:13:37.0738266Z * [new branch] gh/lucaskabela/11/base -> origin/gh/lucaskabela/11/base 2025-09-07T06:13:37.0739462Z * [new branch] gh/lucaskabela/11/head -> origin/gh/lucaskabela/11/head 2025-09-07T06:13:37.0740629Z * [new branch] gh/lucaskabela/11/orig -> origin/gh/lucaskabela/11/orig 2025-09-07T06:13:37.0742025Z * [new branch] gh/lucaskabela/12/base -> origin/gh/lucaskabela/12/base 2025-09-07T06:13:37.0743201Z * [new branch] gh/lucaskabela/12/head -> origin/gh/lucaskabela/12/head 2025-09-07T06:13:37.0744350Z * [new branch] gh/lucaskabela/12/orig -> origin/gh/lucaskabela/12/orig 2025-09-07T06:13:37.0745906Z * [new branch] gh/lucaskabela/13/base -> origin/gh/lucaskabela/13/base 2025-09-07T06:13:37.0747063Z * [new branch] gh/lucaskabela/13/head -> origin/gh/lucaskabela/13/head 2025-09-07T06:13:37.0748189Z * [new branch] gh/lucaskabela/13/orig -> origin/gh/lucaskabela/13/orig 2025-09-07T06:13:37.0749580Z * [new branch] gh/lucaskabela/14/base -> origin/gh/lucaskabela/14/base 2025-09-07T06:13:37.0750755Z * [new branch] gh/lucaskabela/14/head -> origin/gh/lucaskabela/14/head 2025-09-07T06:13:37.0751877Z * [new branch] gh/lucaskabela/14/orig -> origin/gh/lucaskabela/14/orig 2025-09-07T06:13:37.0753312Z * [new branch] gh/lucaskabela/15/base -> origin/gh/lucaskabela/15/base 2025-09-07T06:13:37.0754601Z * [new branch] gh/lucaskabela/15/head -> origin/gh/lucaskabela/15/head 2025-09-07T06:13:37.0755754Z * [new branch] gh/lucaskabela/15/orig -> origin/gh/lucaskabela/15/orig 2025-09-07T06:13:37.0757203Z * [new branch] gh/lucaskabela/16/base -> origin/gh/lucaskabela/16/base 2025-09-07T06:13:37.0758368Z * [new branch] gh/lucaskabela/16/head -> origin/gh/lucaskabela/16/head 2025-09-07T06:13:37.0759473Z * [new branch] gh/lucaskabela/16/orig -> origin/gh/lucaskabela/16/orig 2025-09-07T06:13:37.0760960Z * [new branch] gh/lucaskabela/17/base -> origin/gh/lucaskabela/17/base 2025-09-07T06:13:37.0761963Z * [new branch] gh/lucaskabela/17/head -> origin/gh/lucaskabela/17/head 2025-09-07T06:13:37.0763078Z * [new branch] gh/lucaskabela/17/orig -> origin/gh/lucaskabela/17/orig 2025-09-07T06:13:37.0764600Z * [new branch] gh/lucaskabela/2/base -> origin/gh/lucaskabela/2/base 2025-09-07T06:13:37.0765732Z * [new branch] gh/lucaskabela/2/head -> origin/gh/lucaskabela/2/head 2025-09-07T06:13:37.0767352Z * [new branch] gh/lucaskabela/2/orig -> origin/gh/lucaskabela/2/orig 2025-09-07T06:13:37.0768982Z * [new branch] gh/lucaskabela/3/base -> origin/gh/lucaskabela/3/base 2025-09-07T06:13:37.0770069Z * [new branch] gh/lucaskabela/3/head -> origin/gh/lucaskabela/3/head 2025-09-07T06:13:37.0771209Z * [new branch] gh/lucaskabela/3/orig -> origin/gh/lucaskabela/3/orig 2025-09-07T06:13:37.0772706Z * [new branch] gh/lucaskabela/4/base -> origin/gh/lucaskabela/4/base 2025-09-07T06:13:37.0774248Z * [new branch] gh/lucaskabela/4/head -> origin/gh/lucaskabela/4/head 2025-09-07T06:13:37.0775377Z * [new branch] gh/lucaskabela/4/orig -> origin/gh/lucaskabela/4/orig 2025-09-07T06:13:37.0776963Z * [new branch] gh/lucaskabela/5/base -> origin/gh/lucaskabela/5/base 2025-09-07T06:13:37.0778072Z * [new branch] gh/lucaskabela/5/head -> origin/gh/lucaskabela/5/head 2025-09-07T06:13:37.0779271Z * [new branch] gh/lucaskabela/5/orig -> origin/gh/lucaskabela/5/orig 2025-09-07T06:13:37.0780759Z * [new branch] gh/lucaskabela/6/base -> origin/gh/lucaskabela/6/base 2025-09-07T06:13:37.0781942Z * [new branch] gh/lucaskabela/6/head -> origin/gh/lucaskabela/6/head 2025-09-07T06:13:37.0783141Z * [new branch] gh/lucaskabela/6/orig -> origin/gh/lucaskabela/6/orig 2025-09-07T06:13:37.0784778Z * [new branch] gh/lucaskabela/7/base -> origin/gh/lucaskabela/7/base 2025-09-07T06:13:37.0785979Z * [new branch] gh/lucaskabela/7/head -> origin/gh/lucaskabela/7/head 2025-09-07T06:13:37.0787107Z * [new branch] gh/lucaskabela/7/orig -> origin/gh/lucaskabela/7/orig 2025-09-07T06:13:37.0788527Z * [new branch] gh/lucaskabela/8/base -> origin/gh/lucaskabela/8/base 2025-09-07T06:13:37.0789762Z * [new branch] gh/lucaskabela/8/head -> origin/gh/lucaskabela/8/head 2025-09-07T06:13:37.0790940Z * [new branch] gh/lucaskabela/8/orig -> origin/gh/lucaskabela/8/orig 2025-09-07T06:13:37.0792899Z * [new branch] gh/lucaskabela/9/base -> origin/gh/lucaskabela/9/base 2025-09-07T06:13:37.0794150Z * [new branch] gh/lucaskabela/9/head -> origin/gh/lucaskabela/9/head 2025-09-07T06:13:37.0795345Z * [new branch] gh/lucaskabela/9/orig -> origin/gh/lucaskabela/9/orig 2025-09-07T06:13:37.0797426Z * [new branch] gh/lw/3/base -> origin/gh/lw/3/base 2025-09-07T06:13:37.0798441Z * [new branch] gh/lw/3/head -> origin/gh/lw/3/head 2025-09-07T06:13:37.0799589Z * [new branch] gh/lw/3/orig -> origin/gh/lw/3/orig 2025-09-07T06:13:37.0801505Z * [new branch] gh/malfet/14/base -> origin/gh/malfet/14/base 2025-09-07T06:13:37.0803328Z * [new branch] gh/malfet/330/base -> origin/gh/malfet/330/base 2025-09-07T06:13:37.0804303Z * [new branch] gh/malfet/330/head -> origin/gh/malfet/330/head 2025-09-07T06:13:37.0805562Z * [new branch] gh/malfet/330/orig -> origin/gh/malfet/330/orig 2025-09-07T06:13:37.0807120Z * [new branch] gh/malfet/396/base -> origin/gh/malfet/396/base 2025-09-07T06:13:37.0808367Z * [new branch] gh/malfet/396/head -> origin/gh/malfet/396/head 2025-09-07T06:13:37.0809409Z * [new branch] gh/malfet/396/orig -> origin/gh/malfet/396/orig 2025-09-07T06:13:37.0810958Z * [new branch] gh/malfet/397/base -> origin/gh/malfet/397/base 2025-09-07T06:13:37.0812086Z * [new branch] gh/malfet/397/head -> origin/gh/malfet/397/head 2025-09-07T06:13:37.0813545Z * [new branch] gh/malfet/397/orig -> origin/gh/malfet/397/orig 2025-09-07T06:13:37.0815106Z * [new branch] gh/malfet/398/base -> origin/gh/malfet/398/base 2025-09-07T06:13:37.0816194Z * [new branch] gh/malfet/398/head -> origin/gh/malfet/398/head 2025-09-07T06:13:37.0817426Z * [new branch] gh/malfet/398/orig -> origin/gh/malfet/398/orig 2025-09-07T06:13:37.0819003Z * [new branch] gh/malfet/399/base -> origin/gh/malfet/399/base 2025-09-07T06:13:37.0820183Z * [new branch] gh/malfet/399/head -> origin/gh/malfet/399/head 2025-09-07T06:13:37.0821404Z * [new branch] gh/malfet/399/orig -> origin/gh/malfet/399/orig 2025-09-07T06:13:37.0823081Z * [new branch] gh/malfet/414/base -> origin/gh/malfet/414/base 2025-09-07T06:13:37.0824275Z * [new branch] gh/malfet/414/head -> origin/gh/malfet/414/head 2025-09-07T06:13:37.0825549Z * [new branch] gh/malfet/414/orig -> origin/gh/malfet/414/orig 2025-09-07T06:13:37.0827076Z * [new branch] gh/malfet/417/base -> origin/gh/malfet/417/base 2025-09-07T06:13:37.0828214Z * [new branch] gh/malfet/417/head -> origin/gh/malfet/417/head 2025-09-07T06:13:37.0829369Z * [new branch] gh/malfet/417/orig -> origin/gh/malfet/417/orig 2025-09-07T06:13:37.0830822Z * [new branch] gh/malfet/418/base -> origin/gh/malfet/418/base 2025-09-07T06:13:37.0831973Z * [new branch] gh/malfet/418/head -> origin/gh/malfet/418/head 2025-09-07T06:13:37.0833089Z * [new branch] gh/malfet/418/orig -> origin/gh/malfet/418/orig 2025-09-07T06:13:37.0834698Z * [new branch] gh/malfet/475/base -> origin/gh/malfet/475/base 2025-09-07T06:13:37.0835971Z * [new branch] gh/malfet/475/head -> origin/gh/malfet/475/head 2025-09-07T06:13:37.0837110Z * [new branch] gh/malfet/475/orig -> origin/gh/malfet/475/orig 2025-09-07T06:13:37.0838632Z * [new branch] gh/malfet/476/base -> origin/gh/malfet/476/base 2025-09-07T06:13:37.0839776Z * [new branch] gh/malfet/476/head -> origin/gh/malfet/476/head 2025-09-07T06:13:37.0840921Z * [new branch] gh/malfet/476/orig -> origin/gh/malfet/476/orig 2025-09-07T06:13:37.0842312Z * [new branch] gh/malfet/477/base -> origin/gh/malfet/477/base 2025-09-07T06:13:37.0843453Z * [new branch] gh/malfet/477/head -> origin/gh/malfet/477/head 2025-09-07T06:13:37.0844634Z * [new branch] gh/malfet/477/orig -> origin/gh/malfet/477/orig 2025-09-07T06:13:37.0846055Z * [new branch] gh/malfet/478/base -> origin/gh/malfet/478/base 2025-09-07T06:13:37.0847204Z * [new branch] gh/malfet/478/head -> origin/gh/malfet/478/head 2025-09-07T06:13:37.0848357Z * [new branch] gh/malfet/478/orig -> origin/gh/malfet/478/orig 2025-09-07T06:13:37.0851733Z * [new branch] gh/malfet/479/base -> origin/gh/malfet/479/base 2025-09-07T06:13:37.0851975Z * [new branch] gh/malfet/479/head -> origin/gh/malfet/479/head 2025-09-07T06:13:37.0852199Z * [new branch] gh/malfet/479/orig -> origin/gh/malfet/479/orig 2025-09-07T06:13:37.0853974Z * [new branch] gh/malfet/480/base -> origin/gh/malfet/480/base 2025-09-07T06:13:37.0855191Z * [new branch] gh/malfet/480/head -> origin/gh/malfet/480/head 2025-09-07T06:13:37.0856336Z * [new branch] gh/malfet/480/orig -> origin/gh/malfet/480/orig 2025-09-07T06:13:37.0857941Z * [new branch] gh/malfet/481/base -> origin/gh/malfet/481/base 2025-09-07T06:13:37.0859062Z * [new branch] gh/malfet/481/head -> origin/gh/malfet/481/head 2025-09-07T06:13:37.0860259Z * [new branch] gh/malfet/481/orig -> origin/gh/malfet/481/orig 2025-09-07T06:13:37.0861794Z * [new branch] gh/malfet/482/base -> origin/gh/malfet/482/base 2025-09-07T06:13:37.0862946Z * [new branch] gh/malfet/482/head -> origin/gh/malfet/482/head 2025-09-07T06:13:37.0864189Z * [new branch] gh/malfet/482/orig -> origin/gh/malfet/482/orig 2025-09-07T06:13:37.0866326Z * [new branch] gh/malfet/483/base -> origin/gh/malfet/483/base 2025-09-07T06:13:37.0867947Z * [new branch] gh/malfet/483/head -> origin/gh/malfet/483/head 2025-09-07T06:13:37.0869105Z * [new branch] gh/malfet/483/orig -> origin/gh/malfet/483/orig 2025-09-07T06:13:37.0870750Z * [new branch] gh/malfet/484/base -> origin/gh/malfet/484/base 2025-09-07T06:13:37.0871829Z * [new branch] gh/malfet/484/head -> origin/gh/malfet/484/head 2025-09-07T06:13:37.0873014Z * [new branch] gh/malfet/484/orig -> origin/gh/malfet/484/orig 2025-09-07T06:13:37.0874615Z * [new branch] gh/malfet/485/base -> origin/gh/malfet/485/base 2025-09-07T06:13:37.0875767Z * [new branch] gh/malfet/485/head -> origin/gh/malfet/485/head 2025-09-07T06:13:37.0876952Z * [new branch] gh/malfet/485/orig -> origin/gh/malfet/485/orig 2025-09-07T06:13:37.0878556Z * [new branch] gh/malfet/486/base -> origin/gh/malfet/486/base 2025-09-07T06:13:37.0879642Z * [new branch] gh/malfet/486/head -> origin/gh/malfet/486/head 2025-09-07T06:13:37.0880835Z * [new branch] gh/malfet/486/orig -> origin/gh/malfet/486/orig 2025-09-07T06:13:37.0882321Z * [new branch] gh/malfet/487/base -> origin/gh/malfet/487/base 2025-09-07T06:13:37.0883520Z * [new branch] gh/malfet/487/head -> origin/gh/malfet/487/head 2025-09-07T06:13:37.0884690Z * [new branch] gh/malfet/487/orig -> origin/gh/malfet/487/orig 2025-09-07T06:13:37.0886281Z * [new branch] gh/malfet/488/base -> origin/gh/malfet/488/base 2025-09-07T06:13:37.0887358Z * [new branch] gh/malfet/488/head -> origin/gh/malfet/488/head 2025-09-07T06:13:37.0888551Z * [new branch] gh/malfet/488/orig -> origin/gh/malfet/488/orig 2025-09-07T06:13:37.0890248Z * [new branch] gh/malfet/489/base -> origin/gh/malfet/489/base 2025-09-07T06:13:37.0891423Z * [new branch] gh/malfet/489/head -> origin/gh/malfet/489/head 2025-09-07T06:13:37.0893309Z * [new branch] gh/malfet/489/orig -> origin/gh/malfet/489/orig 2025-09-07T06:13:37.0895045Z * [new branch] gh/malfet/490/base -> origin/gh/malfet/490/base 2025-09-07T06:13:37.0896224Z * [new branch] gh/malfet/490/head -> origin/gh/malfet/490/head 2025-09-07T06:13:37.0897473Z * [new branch] gh/malfet/490/orig -> origin/gh/malfet/490/orig 2025-09-07T06:13:37.0899095Z * [new branch] gh/malfet/491/base -> origin/gh/malfet/491/base 2025-09-07T06:13:37.0900320Z * [new branch] gh/malfet/491/head -> origin/gh/malfet/491/head 2025-09-07T06:13:37.0901710Z * [new branch] gh/malfet/491/orig -> origin/gh/malfet/491/orig 2025-09-07T06:13:37.0903323Z * [new branch] gh/malfet/492/base -> origin/gh/malfet/492/base 2025-09-07T06:13:37.0904604Z * [new branch] gh/malfet/492/head -> origin/gh/malfet/492/head 2025-09-07T06:13:37.0905748Z * [new branch] gh/malfet/492/orig -> origin/gh/malfet/492/orig 2025-09-07T06:13:37.0907378Z * [new branch] gh/malfet/493/base -> origin/gh/malfet/493/base 2025-09-07T06:13:37.0908468Z * [new branch] gh/malfet/493/head -> origin/gh/malfet/493/head 2025-09-07T06:13:37.0909660Z * [new branch] gh/malfet/493/orig -> origin/gh/malfet/493/orig 2025-09-07T06:13:37.0911586Z * [new branch] gh/malfet/494/base -> origin/gh/malfet/494/base 2025-09-07T06:13:37.0912848Z * [new branch] gh/malfet/494/head -> origin/gh/malfet/494/head 2025-09-07T06:13:37.0914027Z * [new branch] gh/malfet/494/orig -> origin/gh/malfet/494/orig 2025-09-07T06:13:37.0915474Z * [new branch] gh/malfet/495/base -> origin/gh/malfet/495/base 2025-09-07T06:13:37.0916657Z * [new branch] gh/malfet/495/head -> origin/gh/malfet/495/head 2025-09-07T06:13:37.0917803Z * [new branch] gh/malfet/495/orig -> origin/gh/malfet/495/orig 2025-09-07T06:13:37.0919367Z * [new branch] gh/malfet/496/base -> origin/gh/malfet/496/base 2025-09-07T06:13:37.0920696Z * [new branch] gh/malfet/496/head -> origin/gh/malfet/496/head 2025-09-07T06:13:37.0921733Z * [new branch] gh/malfet/496/orig -> origin/gh/malfet/496/orig 2025-09-07T06:13:37.0923239Z * [new branch] gh/malfet/497/base -> origin/gh/malfet/497/base 2025-09-07T06:13:37.0924345Z * [new branch] gh/malfet/497/head -> origin/gh/malfet/497/head 2025-09-07T06:13:37.0925575Z * [new branch] gh/malfet/497/orig -> origin/gh/malfet/497/orig 2025-09-07T06:13:37.0927163Z * [new branch] gh/malfet/498/base -> origin/gh/malfet/498/base 2025-09-07T06:13:37.0928254Z * [new branch] gh/malfet/498/head -> origin/gh/malfet/498/head 2025-09-07T06:13:37.0929398Z * [new branch] gh/malfet/498/orig -> origin/gh/malfet/498/orig 2025-09-07T06:13:37.0930845Z * [new branch] gh/malfet/499/base -> origin/gh/malfet/499/base 2025-09-07T06:13:37.0932009Z * [new branch] gh/malfet/499/head -> origin/gh/malfet/499/head 2025-09-07T06:13:37.0933446Z * [new branch] gh/malfet/499/orig -> origin/gh/malfet/499/orig 2025-09-07T06:13:37.0935111Z * [new branch] gh/malfet/500/base -> origin/gh/malfet/500/base 2025-09-07T06:13:37.0936250Z * [new branch] gh/malfet/500/head -> origin/gh/malfet/500/head 2025-09-07T06:13:37.0937426Z * [new branch] gh/malfet/500/orig -> origin/gh/malfet/500/orig 2025-09-07T06:13:37.0939135Z * [new branch] gh/malfet/501/base -> origin/gh/malfet/501/base 2025-09-07T06:13:37.0940288Z * [new branch] gh/malfet/501/head -> origin/gh/malfet/501/head 2025-09-07T06:13:37.0941483Z * [new branch] gh/malfet/501/orig -> origin/gh/malfet/501/orig 2025-09-07T06:13:37.0943074Z * [new branch] gh/malfet/502/base -> origin/gh/malfet/502/base 2025-09-07T06:13:37.0944253Z * [new branch] gh/malfet/502/head -> origin/gh/malfet/502/head 2025-09-07T06:13:37.0945528Z * [new branch] gh/malfet/502/orig -> origin/gh/malfet/502/orig 2025-09-07T06:13:37.0947090Z * [new branch] gh/malfet/503/base -> origin/gh/malfet/503/base 2025-09-07T06:13:37.0948224Z * [new branch] gh/malfet/503/head -> origin/gh/malfet/503/head 2025-09-07T06:13:37.0949371Z * [new branch] gh/malfet/503/orig -> origin/gh/malfet/503/orig 2025-09-07T06:13:37.0951032Z * [new branch] gh/malfet/504/base -> origin/gh/malfet/504/base 2025-09-07T06:13:37.0952154Z * [new branch] gh/malfet/504/head -> origin/gh/malfet/504/head 2025-09-07T06:13:37.0953298Z * [new branch] gh/malfet/504/orig -> origin/gh/malfet/504/orig 2025-09-07T06:13:37.0954922Z * [new branch] gh/malfet/505/base -> origin/gh/malfet/505/base 2025-09-07T06:13:37.0956022Z * [new branch] gh/malfet/505/head -> origin/gh/malfet/505/head 2025-09-07T06:13:37.0957385Z * [new branch] gh/malfet/505/orig -> origin/gh/malfet/505/orig 2025-09-07T06:13:37.0959001Z * [new branch] gh/malfet/506/base -> origin/gh/malfet/506/base 2025-09-07T06:13:37.0960145Z * [new branch] gh/malfet/506/head -> origin/gh/malfet/506/head 2025-09-07T06:13:37.0961729Z * [new branch] gh/malfet/506/orig -> origin/gh/malfet/506/orig 2025-09-07T06:13:37.0963317Z * [new branch] gh/malfet/507/base -> origin/gh/malfet/507/base 2025-09-07T06:13:37.0964444Z * [new branch] gh/malfet/507/head -> origin/gh/malfet/507/head 2025-09-07T06:13:37.0965593Z * [new branch] gh/malfet/507/orig -> origin/gh/malfet/507/orig 2025-09-07T06:13:37.0967253Z * [new branch] gh/malfet/508/base -> origin/gh/malfet/508/base 2025-09-07T06:13:37.0968424Z * [new branch] gh/malfet/508/head -> origin/gh/malfet/508/head 2025-09-07T06:13:37.0969622Z * [new branch] gh/malfet/508/orig -> origin/gh/malfet/508/orig 2025-09-07T06:13:37.0971070Z * [new branch] gh/malfet/509/base -> origin/gh/malfet/509/base 2025-09-07T06:13:37.0972180Z * [new branch] gh/malfet/509/head -> origin/gh/malfet/509/head 2025-09-07T06:13:37.0973766Z * [new branch] gh/malfet/509/orig -> origin/gh/malfet/509/orig 2025-09-07T06:13:37.0975464Z * [new branch] gh/malfet/510/base -> origin/gh/malfet/510/base 2025-09-07T06:13:37.0976640Z * [new branch] gh/malfet/510/head -> origin/gh/malfet/510/head 2025-09-07T06:13:37.0977798Z * [new branch] gh/malfet/510/orig -> origin/gh/malfet/510/orig 2025-09-07T06:13:37.0979437Z * [new branch] gh/malfet/511/base -> origin/gh/malfet/511/base 2025-09-07T06:13:37.0980608Z * [new branch] gh/malfet/511/head -> origin/gh/malfet/511/head 2025-09-07T06:13:37.0981788Z * [new branch] gh/malfet/511/orig -> origin/gh/malfet/511/orig 2025-09-07T06:13:37.0983360Z * [new branch] gh/malfet/512/base -> origin/gh/malfet/512/base 2025-09-07T06:13:37.0984515Z * [new branch] gh/malfet/512/head -> origin/gh/malfet/512/head 2025-09-07T06:13:37.0985802Z * [new branch] gh/malfet/512/orig -> origin/gh/malfet/512/orig 2025-09-07T06:13:37.0987378Z * [new branch] gh/malfet/513/base -> origin/gh/malfet/513/base 2025-09-07T06:13:37.0988496Z * [new branch] gh/malfet/513/head -> origin/gh/malfet/513/head 2025-09-07T06:13:37.0989639Z * [new branch] gh/malfet/513/orig -> origin/gh/malfet/513/orig 2025-09-07T06:13:37.0991250Z * [new branch] gh/malfet/64/base -> origin/gh/malfet/64/base 2025-09-07T06:13:37.0993593Z * [new branch] gh/malfet/64/head -> origin/gh/malfet/64/head 2025-09-07T06:13:37.0996438Z * [new branch] gh/manuelcandales/10/base -> origin/gh/manuelcandales/10/base 2025-09-07T06:13:37.0997712Z * [new branch] gh/manuelcandales/10/head -> origin/gh/manuelcandales/10/head 2025-09-07T06:13:37.0998927Z * [new branch] gh/manuelcandales/10/orig -> origin/gh/manuelcandales/10/orig 2025-09-07T06:13:37.1000692Z * [new branch] gh/manuelcandales/11/base -> origin/gh/manuelcandales/11/base 2025-09-07T06:13:37.1001932Z * [new branch] gh/manuelcandales/11/head -> origin/gh/manuelcandales/11/head 2025-09-07T06:13:37.1003591Z * [new branch] gh/manuelcandales/11/orig -> origin/gh/manuelcandales/11/orig 2025-09-07T06:13:37.1005302Z * [new branch] gh/manuelcandales/9/base -> origin/gh/manuelcandales/9/base 2025-09-07T06:13:37.1006444Z * [new branch] gh/manuelcandales/9/head -> origin/gh/manuelcandales/9/head 2025-09-07T06:13:37.1007623Z * [new branch] gh/manuelcandales/9/orig -> origin/gh/manuelcandales/9/orig 2025-09-07T06:13:37.1010160Z * [new branch] gh/markkm/1/base -> origin/gh/markkm/1/base 2025-09-07T06:13:37.1012182Z * [new branch] gh/masnesral/204/base -> origin/gh/masnesral/204/base 2025-09-07T06:13:37.1013811Z * [new branch] gh/masnesral/204/head -> origin/gh/masnesral/204/head 2025-09-07T06:13:37.1015079Z * [new branch] gh/masnesral/204/orig -> origin/gh/masnesral/204/orig 2025-09-07T06:13:37.1016910Z * [new branch] gh/masnesral/235/base -> origin/gh/masnesral/235/base 2025-09-07T06:13:37.1017991Z * [new branch] gh/masnesral/235/head -> origin/gh/masnesral/235/head 2025-09-07T06:13:37.1019207Z * [new branch] gh/masnesral/235/orig -> origin/gh/masnesral/235/orig 2025-09-07T06:13:37.1020822Z * [new branch] gh/masnesral/34/base -> origin/gh/masnesral/34/base 2025-09-07T06:13:37.1022920Z * [new branch] gh/mhorowitz/0/base -> origin/gh/mhorowitz/0/base 2025-09-07T06:13:37.1024164Z * [new branch] gh/mhorowitz/0/head -> origin/gh/mhorowitz/0/head 2025-09-07T06:13:37.1025658Z * [new branch] gh/mhorowitz/1/base -> origin/gh/mhorowitz/1/base 2025-09-07T06:13:37.1026856Z * [new branch] gh/mhorowitz/1/head -> origin/gh/mhorowitz/1/head 2025-09-07T06:13:37.1028222Z * [new branch] gh/mhorowitz/2/base -> origin/gh/mhorowitz/2/base 2025-09-07T06:13:37.1029397Z * [new branch] gh/mhorowitz/2/head -> origin/gh/mhorowitz/2/head 2025-09-07T06:13:37.1030845Z * [new branch] gh/mhorowitz/3/base -> origin/gh/mhorowitz/3/base 2025-09-07T06:13:37.1031905Z * [new branch] gh/mhorowitz/3/head -> origin/gh/mhorowitz/3/head 2025-09-07T06:13:37.1033283Z * [new branch] gh/mhorowitz/4/base -> origin/gh/mhorowitz/4/base 2025-09-07T06:13:37.1034383Z * [new branch] gh/mhorowitz/4/head -> origin/gh/mhorowitz/4/head 2025-09-07T06:13:37.1035811Z * [new branch] gh/mhorowitz/5/base -> origin/gh/mhorowitz/5/base 2025-09-07T06:13:37.1037172Z * [new branch] gh/mhorowitz/5/head -> origin/gh/mhorowitz/5/head 2025-09-07T06:13:37.1038240Z * [new branch] gh/mhorowitz/6/base -> origin/gh/mhorowitz/6/base 2025-09-07T06:13:37.1039248Z * [new branch] gh/mhorowitz/6/head -> origin/gh/mhorowitz/6/head 2025-09-07T06:13:37.1041153Z * [new branch] gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base 2025-09-07T06:13:37.1042338Z * [new branch] gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head 2025-09-07T06:13:37.1043820Z * [new branch] gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base 2025-09-07T06:13:37.1044919Z * [new branch] gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head 2025-09-07T06:13:37.1046347Z * [new branch] gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base 2025-09-07T06:13:37.1047414Z * [new branch] gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head 2025-09-07T06:13:37.1048969Z * [new branch] gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base 2025-09-07T06:13:37.1050052Z * [new branch] gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head 2025-09-07T06:13:37.1051570Z * [new branch] gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base 2025-09-07T06:13:37.1052749Z * [new branch] gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head 2025-09-07T06:13:37.1054547Z * [new branch] gh/mikaylagawarecki/317/base -> origin/gh/mikaylagawarecki/317/base 2025-09-07T06:13:37.1055750Z * [new branch] gh/mikaylagawarecki/317/head -> origin/gh/mikaylagawarecki/317/head 2025-09-07T06:13:37.1056984Z * [new branch] gh/mikaylagawarecki/317/orig -> origin/gh/mikaylagawarecki/317/orig 2025-09-07T06:13:37.1058582Z * [new branch] gh/mikaylagawarecki/320/base -> origin/gh/mikaylagawarecki/320/base 2025-09-07T06:13:37.1059724Z * [new branch] gh/mikaylagawarecki/320/head -> origin/gh/mikaylagawarecki/320/head 2025-09-07T06:13:37.1060911Z * [new branch] gh/mikaylagawarecki/320/orig -> origin/gh/mikaylagawarecki/320/orig 2025-09-07T06:13:37.1062412Z * [new branch] gh/mikaylagawarecki/329/base -> origin/gh/mikaylagawarecki/329/base 2025-09-07T06:13:37.1063605Z * [new branch] gh/mikaylagawarecki/329/head -> origin/gh/mikaylagawarecki/329/head 2025-09-07T06:13:37.1064825Z * [new branch] gh/mikaylagawarecki/329/orig -> origin/gh/mikaylagawarecki/329/orig 2025-09-07T06:13:37.1066635Z * [new branch] gh/mikaylagawarecki/330/base -> origin/gh/mikaylagawarecki/330/base 2025-09-07T06:13:37.1067807Z * [new branch] gh/mikaylagawarecki/330/head -> origin/gh/mikaylagawarecki/330/head 2025-09-07T06:13:37.1069000Z * [new branch] gh/mikaylagawarecki/330/orig -> origin/gh/mikaylagawarecki/330/orig 2025-09-07T06:13:37.1070568Z * [new branch] gh/mikaylagawarecki/331/base -> origin/gh/mikaylagawarecki/331/base 2025-09-07T06:13:37.1071848Z * [new branch] gh/mikaylagawarecki/331/head -> origin/gh/mikaylagawarecki/331/head 2025-09-07T06:13:37.1072936Z * [new branch] gh/mikaylagawarecki/331/orig -> origin/gh/mikaylagawarecki/331/orig 2025-09-07T06:13:37.1074858Z * [new branch] gh/mikaylagawarecki/332/base -> origin/gh/mikaylagawarecki/332/base 2025-09-07T06:13:37.1075936Z * [new branch] gh/mikaylagawarecki/332/head -> origin/gh/mikaylagawarecki/332/head 2025-09-07T06:13:37.1076992Z * [new branch] gh/mikaylagawarecki/332/orig -> origin/gh/mikaylagawarecki/332/orig 2025-09-07T06:13:37.1078536Z * [new branch] gh/mikaylagawarecki/334/base -> origin/gh/mikaylagawarecki/334/base 2025-09-07T06:13:37.1079603Z * [new branch] gh/mikaylagawarecki/334/head -> origin/gh/mikaylagawarecki/334/head 2025-09-07T06:13:37.1080830Z * [new branch] gh/mikaylagawarecki/334/orig -> origin/gh/mikaylagawarecki/334/orig 2025-09-07T06:13:37.1082405Z * [new branch] gh/mikaylagawarecki/335/base -> origin/gh/mikaylagawarecki/335/base 2025-09-07T06:13:37.1083572Z * [new branch] gh/mikaylagawarecki/335/head -> origin/gh/mikaylagawarecki/335/head 2025-09-07T06:13:37.1084701Z * [new branch] gh/mikaylagawarecki/335/orig -> origin/gh/mikaylagawarecki/335/orig 2025-09-07T06:13:37.1086262Z * [new branch] gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base 2025-09-07T06:13:37.1087366Z * [new branch] gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head 2025-09-07T06:13:37.1088532Z * [new branch] gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig 2025-09-07T06:13:37.1089925Z * [new branch] gh/mikaylagawarecki/337/base -> origin/gh/mikaylagawarecki/337/base 2025-09-07T06:13:37.1091022Z * [new branch] gh/mikaylagawarecki/337/head -> origin/gh/mikaylagawarecki/337/head 2025-09-07T06:13:37.1092346Z * [new branch] gh/mikaylagawarecki/337/orig -> origin/gh/mikaylagawarecki/337/orig 2025-09-07T06:13:37.1094298Z * [new branch] gh/mikaylagawarecki/338/base -> origin/gh/mikaylagawarecki/338/base 2025-09-07T06:13:37.1095490Z * [new branch] gh/mikaylagawarecki/338/head -> origin/gh/mikaylagawarecki/338/head 2025-09-07T06:13:37.1096757Z * [new branch] gh/mikaylagawarecki/338/orig -> origin/gh/mikaylagawarecki/338/orig 2025-09-07T06:13:37.1098635Z * [new branch] gh/mikaylagawarecki/339/base -> origin/gh/mikaylagawarecki/339/base 2025-09-07T06:13:37.1099855Z * [new branch] gh/mikaylagawarecki/339/head -> origin/gh/mikaylagawarecki/339/head 2025-09-07T06:13:37.1101031Z * [new branch] gh/mikaylagawarecki/339/orig -> origin/gh/mikaylagawarecki/339/orig 2025-09-07T06:13:37.1102949Z * [new branch] gh/mlazos/1/base -> origin/gh/mlazos/1/base 2025-09-07T06:13:37.1104193Z * [new branch] gh/mlazos/1/head -> origin/gh/mlazos/1/head 2025-09-07T06:13:37.1105445Z * [new branch] gh/mlazos/1/orig -> origin/gh/mlazos/1/orig 2025-09-07T06:13:37.1107065Z * [new branch] gh/mlazos/12/base -> origin/gh/mlazos/12/base 2025-09-07T06:13:37.1108193Z * [new branch] gh/mlazos/12/head -> origin/gh/mlazos/12/head 2025-09-07T06:13:37.1109305Z * [new branch] gh/mlazos/12/orig -> origin/gh/mlazos/12/orig 2025-09-07T06:13:37.1110928Z * [new branch] gh/mlazos/13/base -> origin/gh/mlazos/13/base 2025-09-07T06:13:37.1112132Z * [new branch] gh/mlazos/13/head -> origin/gh/mlazos/13/head 2025-09-07T06:13:37.1113253Z * [new branch] gh/mlazos/13/orig -> origin/gh/mlazos/13/orig 2025-09-07T06:13:37.1114840Z * [new branch] gh/mlazos/14/base -> origin/gh/mlazos/14/base 2025-09-07T06:13:37.1115968Z * [new branch] gh/mlazos/14/head -> origin/gh/mlazos/14/head 2025-09-07T06:13:37.1117071Z * [new branch] gh/mlazos/14/orig -> origin/gh/mlazos/14/orig 2025-09-07T06:13:37.1118697Z * [new branch] gh/mlazos/15/base -> origin/gh/mlazos/15/base 2025-09-07T06:13:37.1119832Z * [new branch] gh/mlazos/15/head -> origin/gh/mlazos/15/head 2025-09-07T06:13:37.1120986Z * [new branch] gh/mlazos/15/orig -> origin/gh/mlazos/15/orig 2025-09-07T06:13:37.1122508Z * [new branch] gh/mlazos/16/base -> origin/gh/mlazos/16/base 2025-09-07T06:13:37.1123758Z * [new branch] gh/mlazos/16/head -> origin/gh/mlazos/16/head 2025-09-07T06:13:37.1124880Z * [new branch] gh/mlazos/16/orig -> origin/gh/mlazos/16/orig 2025-09-07T06:13:37.1126327Z * [new branch] gh/mlazos/17/base -> origin/gh/mlazos/17/base 2025-09-07T06:13:37.1127493Z * [new branch] gh/mlazos/17/head -> origin/gh/mlazos/17/head 2025-09-07T06:13:37.1128561Z * [new branch] gh/mlazos/17/orig -> origin/gh/mlazos/17/orig 2025-09-07T06:13:37.1130170Z * [new branch] gh/mlazos/2/base -> origin/gh/mlazos/2/base 2025-09-07T06:13:37.1131243Z * [new branch] gh/mlazos/2/head -> origin/gh/mlazos/2/head 2025-09-07T06:13:37.1132272Z * [new branch] gh/mlazos/2/orig -> origin/gh/mlazos/2/orig 2025-09-07T06:13:37.1134701Z * [new branch] gh/mlazos/3/base -> origin/gh/mlazos/3/base 2025-09-07T06:13:37.1135799Z * [new branch] gh/mlazos/3/head -> origin/gh/mlazos/3/head 2025-09-07T06:13:37.1136970Z * [new branch] gh/mlazos/3/orig -> origin/gh/mlazos/3/orig 2025-09-07T06:13:37.1139343Z * [new branch] gh/mrmiywj/1/base -> origin/gh/mrmiywj/1/base 2025-09-07T06:13:37.1140866Z * [new branch] gh/mrmiywj/1/head -> origin/gh/mrmiywj/1/head 2025-09-07T06:13:37.1142841Z * [new branch] gh/muchulee8/62/base -> origin/gh/muchulee8/62/base 2025-09-07T06:13:37.1144159Z * [new branch] gh/muchulee8/62/head -> origin/gh/muchulee8/62/head 2025-09-07T06:13:37.1145514Z * [new branch] gh/muchulee8/62/orig -> origin/gh/muchulee8/62/orig 2025-09-07T06:13:37.1147052Z * [new branch] gh/muchulee8/63/base -> origin/gh/muchulee8/63/base 2025-09-07T06:13:37.1148210Z * [new branch] gh/muchulee8/63/head -> origin/gh/muchulee8/63/head 2025-09-07T06:13:37.1149366Z * [new branch] gh/muchulee8/63/orig -> origin/gh/muchulee8/63/orig 2025-09-07T06:13:37.1151120Z * [new branch] gh/muchulee8/64/base -> origin/gh/muchulee8/64/base 2025-09-07T06:13:37.1152210Z * [new branch] gh/muchulee8/64/head -> origin/gh/muchulee8/64/head 2025-09-07T06:13:37.1153399Z * [new branch] gh/muchulee8/64/orig -> origin/gh/muchulee8/64/orig 2025-09-07T06:13:37.1155055Z * [new branch] gh/muchulee8/65/base -> origin/gh/muchulee8/65/base 2025-09-07T06:13:37.1156148Z * [new branch] gh/muchulee8/65/head -> origin/gh/muchulee8/65/head 2025-09-07T06:13:37.1157366Z * [new branch] gh/muchulee8/65/orig -> origin/gh/muchulee8/65/orig 2025-09-07T06:13:37.1159387Z * [new branch] gh/naveenthangudu/1/base -> origin/gh/naveenthangudu/1/base 2025-09-07T06:13:37.1160520Z * [new branch] gh/naveenthangudu/1/head -> origin/gh/naveenthangudu/1/head 2025-09-07T06:13:37.1161745Z * [new branch] gh/naveenthangudu/1/orig -> origin/gh/naveenthangudu/1/orig 2025-09-07T06:13:37.1163265Z * [new branch] gh/naveenthangudu/2/base -> origin/gh/naveenthangudu/2/base 2025-09-07T06:13:37.1164405Z * [new branch] gh/naveenthangudu/2/head -> origin/gh/naveenthangudu/2/head 2025-09-07T06:13:37.1166087Z * [new branch] gh/naveenthangudu/2/orig -> origin/gh/naveenthangudu/2/orig 2025-09-07T06:13:37.1168069Z * [new branch] gh/naveenthangudu/3/base -> origin/gh/naveenthangudu/3/base 2025-09-07T06:13:37.1169196Z * [new branch] gh/naveenthangudu/3/head -> origin/gh/naveenthangudu/3/head 2025-09-07T06:13:37.1170358Z * [new branch] gh/naveenthangudu/3/orig -> origin/gh/naveenthangudu/3/orig 2025-09-07T06:13:37.1171834Z * [new branch] gh/naveenthangudu/4/base -> origin/gh/naveenthangudu/4/base 2025-09-07T06:13:37.1173166Z * [new branch] gh/naveenthangudu/4/head -> origin/gh/naveenthangudu/4/head 2025-09-07T06:13:37.1174623Z * [new branch] gh/naveenthangudu/4/orig -> origin/gh/naveenthangudu/4/orig 2025-09-07T06:13:37.1176301Z * [new branch] gh/naveenthangudu/5/base -> origin/gh/naveenthangudu/5/base 2025-09-07T06:13:37.1177470Z * [new branch] gh/naveenthangudu/5/head -> origin/gh/naveenthangudu/5/head 2025-09-07T06:13:37.1178726Z * [new branch] gh/naveenthangudu/5/orig -> origin/gh/naveenthangudu/5/orig 2025-09-07T06:13:37.1180295Z * [new branch] gh/naveenthangudu/6/base -> origin/gh/naveenthangudu/6/base 2025-09-07T06:13:37.1181411Z * [new branch] gh/naveenthangudu/6/head -> origin/gh/naveenthangudu/6/head 2025-09-07T06:13:37.1182482Z * [new branch] gh/naveenthangudu/6/orig -> origin/gh/naveenthangudu/6/orig 2025-09-07T06:13:37.1184404Z * [new branch] gh/oulgen/35/base -> origin/gh/oulgen/35/base 2025-09-07T06:13:37.1185662Z * [new branch] gh/oulgen/35/head -> origin/gh/oulgen/35/head 2025-09-07T06:13:37.1186813Z * [new branch] gh/oulgen/35/orig -> origin/gh/oulgen/35/orig 2025-09-07T06:13:37.1188325Z * [new branch] gh/oulgen/48/base -> origin/gh/oulgen/48/base 2025-09-07T06:13:37.1189529Z * [new branch] gh/oulgen/48/head -> origin/gh/oulgen/48/head 2025-09-07T06:13:37.1190680Z * [new branch] gh/oulgen/48/orig -> origin/gh/oulgen/48/orig 2025-09-07T06:13:37.1192147Z * [new branch] gh/oulgen/49/base -> origin/gh/oulgen/49/base 2025-09-07T06:13:37.1193822Z * [new branch] gh/oulgen/49/head -> origin/gh/oulgen/49/head 2025-09-07T06:13:37.1194978Z * [new branch] gh/oulgen/49/orig -> origin/gh/oulgen/49/orig 2025-09-07T06:13:37.1197078Z * [new branch] gh/pearu/108/base -> origin/gh/pearu/108/base 2025-09-07T06:13:37.1198313Z * [new branch] gh/pearu/108/head -> origin/gh/pearu/108/head 2025-09-07T06:13:37.1199570Z * [new branch] gh/pearu/108/orig -> origin/gh/pearu/108/orig 2025-09-07T06:13:37.1201180Z * [new branch] gh/pearu/109/base -> origin/gh/pearu/109/base 2025-09-07T06:13:37.1202350Z * [new branch] gh/pearu/109/head -> origin/gh/pearu/109/head 2025-09-07T06:13:37.1203545Z * [new branch] gh/pearu/109/orig -> origin/gh/pearu/109/orig 2025-09-07T06:13:37.1205170Z * [new branch] gh/pearu/110/base -> origin/gh/pearu/110/base 2025-09-07T06:13:37.1206479Z * [new branch] gh/pearu/110/head -> origin/gh/pearu/110/head 2025-09-07T06:13:37.1207601Z * [new branch] gh/pearu/110/orig -> origin/gh/pearu/110/orig 2025-09-07T06:13:37.1209225Z * [new branch] gh/pearu/111/base -> origin/gh/pearu/111/base 2025-09-07T06:13:37.1210331Z * [new branch] gh/pearu/111/head -> origin/gh/pearu/111/head 2025-09-07T06:13:37.1211624Z * [new branch] gh/pearu/111/orig -> origin/gh/pearu/111/orig 2025-09-07T06:13:37.1213941Z * [new branch] gh/pearu/112/base -> origin/gh/pearu/112/base 2025-09-07T06:13:37.1215114Z * [new branch] gh/pearu/112/head -> origin/gh/pearu/112/head 2025-09-07T06:13:37.1216257Z * [new branch] gh/pearu/112/orig -> origin/gh/pearu/112/orig 2025-09-07T06:13:37.1217836Z * [new branch] gh/pearu/113/base -> origin/gh/pearu/113/base 2025-09-07T06:13:37.1219020Z * [new branch] gh/pearu/113/head -> origin/gh/pearu/113/head 2025-09-07T06:13:37.1220188Z * [new branch] gh/pearu/113/orig -> origin/gh/pearu/113/orig 2025-09-07T06:13:37.1221789Z * [new branch] gh/pearu/114/base -> origin/gh/pearu/114/base 2025-09-07T06:13:37.1223096Z * [new branch] gh/pearu/114/head -> origin/gh/pearu/114/head 2025-09-07T06:13:37.1224400Z * [new branch] gh/pearu/114/orig -> origin/gh/pearu/114/orig 2025-09-07T06:13:37.1226066Z * [new branch] gh/pearu/115/base -> origin/gh/pearu/115/base 2025-09-07T06:13:37.1227247Z * [new branch] gh/pearu/115/head -> origin/gh/pearu/115/head 2025-09-07T06:13:37.1228748Z * [new branch] gh/pearu/115/orig -> origin/gh/pearu/115/orig 2025-09-07T06:13:37.1230993Z * [new branch] gh/pearu/116/base -> origin/gh/pearu/116/base 2025-09-07T06:13:37.1232091Z * [new branch] gh/pearu/116/head -> origin/gh/pearu/116/head 2025-09-07T06:13:37.1233305Z * [new branch] gh/pearu/116/orig -> origin/gh/pearu/116/orig 2025-09-07T06:13:37.1234861Z * [new branch] gh/pearu/117/base -> origin/gh/pearu/117/base 2025-09-07T06:13:37.1235884Z * [new branch] gh/pearu/117/head -> origin/gh/pearu/117/head 2025-09-07T06:13:37.1236917Z * [new branch] gh/pearu/117/orig -> origin/gh/pearu/117/orig 2025-09-07T06:13:37.1238901Z * [new branch] gh/pearu/56/base -> origin/gh/pearu/56/base 2025-09-07T06:13:37.1240448Z * [new branch] gh/pearu/56/head -> origin/gh/pearu/56/head 2025-09-07T06:13:37.1241530Z * [new branch] gh/pearu/56/orig -> origin/gh/pearu/56/orig 2025-09-07T06:13:37.1243870Z * [new branch] gh/pearu/97/base -> origin/gh/pearu/97/base 2025-09-07T06:13:37.1245012Z * [new branch] gh/pearu/97/head -> origin/gh/pearu/97/head 2025-09-07T06:13:37.1246136Z * [new branch] gh/pearu/97/orig -> origin/gh/pearu/97/orig 2025-09-07T06:13:37.1248098Z * [new branch] gh/qqaatw/29/base -> origin/gh/qqaatw/29/base 2025-09-07T06:13:37.1249221Z * [new branch] gh/qqaatw/29/head -> origin/gh/qqaatw/29/head 2025-09-07T06:13:37.1250317Z * [new branch] gh/qqaatw/29/orig -> origin/gh/qqaatw/29/orig 2025-09-07T06:13:37.1251907Z * [new branch] gh/raymo/refresh-script -> origin/gh/raymo/refresh-script 2025-09-07T06:13:37.1254026Z * [new branch] gh/rec/141/base -> origin/gh/rec/141/base 2025-09-07T06:13:37.1255265Z * [new branch] gh/rec/141/head -> origin/gh/rec/141/head 2025-09-07T06:13:37.1256812Z * [new branch] gh/rec/153/base -> origin/gh/rec/153/base 2025-09-07T06:13:37.1257973Z * [new branch] gh/rec/153/head -> origin/gh/rec/153/head 2025-09-07T06:13:37.1259182Z * [new branch] gh/rec/153/orig -> origin/gh/rec/153/orig 2025-09-07T06:13:37.1260734Z * [new branch] gh/rec/154/base -> origin/gh/rec/154/base 2025-09-07T06:13:37.1261867Z * [new branch] gh/rec/154/head -> origin/gh/rec/154/head 2025-09-07T06:13:37.1263080Z * [new branch] gh/rec/154/orig -> origin/gh/rec/154/orig 2025-09-07T06:13:37.1264686Z * [new branch] gh/rec/156/base -> origin/gh/rec/156/base 2025-09-07T06:13:37.1265943Z * [new branch] gh/rec/156/head -> origin/gh/rec/156/head 2025-09-07T06:13:37.1267064Z * [new branch] gh/rec/156/orig -> origin/gh/rec/156/orig 2025-09-07T06:13:37.1268547Z * [new branch] gh/rec/160/base -> origin/gh/rec/160/base 2025-09-07T06:13:37.1269678Z * [new branch] gh/rec/160/head -> origin/gh/rec/160/head 2025-09-07T06:13:37.1270884Z * [new branch] gh/rec/160/orig -> origin/gh/rec/160/orig 2025-09-07T06:13:37.1272420Z * [new branch] gh/rec/162/base -> origin/gh/rec/162/base 2025-09-07T06:13:37.1273563Z * [new branch] gh/rec/162/head -> origin/gh/rec/162/head 2025-09-07T06:13:37.1274757Z * [new branch] gh/rec/162/orig -> origin/gh/rec/162/orig 2025-09-07T06:13:37.1276198Z * [new branch] gh/rec/163/base -> origin/gh/rec/163/base 2025-09-07T06:13:37.1277327Z * [new branch] gh/rec/163/head -> origin/gh/rec/163/head 2025-09-07T06:13:37.1278415Z * [new branch] gh/rec/163/orig -> origin/gh/rec/163/orig 2025-09-07T06:13:37.1279847Z * [new branch] gh/rec/164/base -> origin/gh/rec/164/base 2025-09-07T06:13:37.1280949Z * [new branch] gh/rec/164/head -> origin/gh/rec/164/head 2025-09-07T06:13:37.1282160Z * [new branch] gh/rec/164/orig -> origin/gh/rec/164/orig 2025-09-07T06:13:37.1283729Z * [new branch] gh/rec/165/base -> origin/gh/rec/165/base 2025-09-07T06:13:37.1284860Z * [new branch] gh/rec/165/head -> origin/gh/rec/165/head 2025-09-07T06:13:37.1286083Z * [new branch] gh/rec/165/orig -> origin/gh/rec/165/orig 2025-09-07T06:13:37.1287588Z * [new branch] gh/rec/166/base -> origin/gh/rec/166/base 2025-09-07T06:13:37.1288803Z * [new branch] gh/rec/166/head -> origin/gh/rec/166/head 2025-09-07T06:13:37.1289866Z * [new branch] gh/rec/166/orig -> origin/gh/rec/166/orig 2025-09-07T06:13:37.1291744Z * [new branch] gh/robert-hardwick/1/base -> origin/gh/robert-hardwick/1/base 2025-09-07T06:13:37.1293537Z * [new branch] gh/robert-hardwick/1/head -> origin/gh/robert-hardwick/1/head 2025-09-07T06:13:37.1294669Z * [new branch] gh/robert-hardwick/1/orig -> origin/gh/robert-hardwick/1/orig 2025-09-07T06:13:37.1296328Z * [new branch] gh/robert-hardwick/2/base -> origin/gh/robert-hardwick/2/base 2025-09-07T06:13:37.1297496Z * [new branch] gh/robert-hardwick/2/head -> origin/gh/robert-hardwick/2/head 2025-09-07T06:13:37.1298717Z * [new branch] gh/robert-hardwick/2/orig -> origin/gh/robert-hardwick/2/orig 2025-09-07T06:13:37.1300324Z * [new branch] gh/robert-hardwick/3/base -> origin/gh/robert-hardwick/3/base 2025-09-07T06:13:37.1301622Z * [new branch] gh/robert-hardwick/3/head -> origin/gh/robert-hardwick/3/head 2025-09-07T06:13:37.1302768Z * [new branch] gh/robert-hardwick/3/orig -> origin/gh/robert-hardwick/3/orig 2025-09-07T06:13:37.1304488Z * [new branch] gh/robert-hardwick/4/base -> origin/gh/robert-hardwick/4/base 2025-09-07T06:13:37.1305597Z * [new branch] gh/robert-hardwick/4/head -> origin/gh/robert-hardwick/4/head 2025-09-07T06:13:37.1306715Z * [new branch] gh/robert-hardwick/4/orig -> origin/gh/robert-hardwick/4/orig 2025-09-07T06:13:37.1308508Z * [new branch] gh/rtimpe/1/base -> origin/gh/rtimpe/1/base 2025-09-07T06:13:37.1309624Z * [new branch] gh/rtimpe/1/head -> origin/gh/rtimpe/1/head 2025-09-07T06:13:37.1311249Z * [new branch] gh/rtimpe/10/base -> origin/gh/rtimpe/10/base 2025-09-07T06:13:37.1312423Z * [new branch] gh/rtimpe/10/head -> origin/gh/rtimpe/10/head 2025-09-07T06:13:37.1313522Z * [new branch] gh/rtimpe/10/orig -> origin/gh/rtimpe/10/orig 2025-09-07T06:13:37.1315078Z * [new branch] gh/rtimpe/11/base -> origin/gh/rtimpe/11/base 2025-09-07T06:13:37.1316292Z * [new branch] gh/rtimpe/11/head -> origin/gh/rtimpe/11/head 2025-09-07T06:13:37.1317414Z * [new branch] gh/rtimpe/11/orig -> origin/gh/rtimpe/11/orig 2025-09-07T06:13:37.1318925Z * [new branch] gh/rtimpe/12/base -> origin/gh/rtimpe/12/base 2025-09-07T06:13:37.1320009Z * [new branch] gh/rtimpe/12/head -> origin/gh/rtimpe/12/head 2025-09-07T06:13:37.1321145Z * [new branch] gh/rtimpe/12/orig -> origin/gh/rtimpe/12/orig 2025-09-07T06:13:37.1322620Z * [new branch] gh/rtimpe/13/base -> origin/gh/rtimpe/13/base 2025-09-07T06:13:37.1323796Z * [new branch] gh/rtimpe/13/head -> origin/gh/rtimpe/13/head 2025-09-07T06:13:37.1324894Z * [new branch] gh/rtimpe/13/orig -> origin/gh/rtimpe/13/orig 2025-09-07T06:13:37.1326409Z * [new branch] gh/rtimpe/14/base -> origin/gh/rtimpe/14/base 2025-09-07T06:13:37.1327480Z * [new branch] gh/rtimpe/14/head -> origin/gh/rtimpe/14/head 2025-09-07T06:13:37.1328600Z * [new branch] gh/rtimpe/14/orig -> origin/gh/rtimpe/14/orig 2025-09-07T06:13:37.1330105Z * [new branch] gh/rtimpe/15/base -> origin/gh/rtimpe/15/base 2025-09-07T06:13:37.1331307Z * [new branch] gh/rtimpe/15/head -> origin/gh/rtimpe/15/head 2025-09-07T06:13:37.1332480Z * [new branch] gh/rtimpe/15/orig -> origin/gh/rtimpe/15/orig 2025-09-07T06:13:37.1334329Z * [new branch] gh/rtimpe/2/base -> origin/gh/rtimpe/2/base 2025-09-07T06:13:37.1335494Z * [new branch] gh/rtimpe/2/head -> origin/gh/rtimpe/2/head 2025-09-07T06:13:37.1336898Z * [new branch] gh/rtimpe/3/base -> origin/gh/rtimpe/3/base 2025-09-07T06:13:37.1337931Z * [new branch] gh/rtimpe/3/head -> origin/gh/rtimpe/3/head 2025-09-07T06:13:37.1339527Z * [new branch] gh/rtimpe/4/base -> origin/gh/rtimpe/4/base 2025-09-07T06:13:37.1340710Z * [new branch] gh/rtimpe/4/head -> origin/gh/rtimpe/4/head 2025-09-07T06:13:37.1342344Z * [new branch] gh/rtimpe/9/base -> origin/gh/rtimpe/9/base 2025-09-07T06:13:37.1343472Z * [new branch] gh/rtimpe/9/head -> origin/gh/rtimpe/9/head 2025-09-07T06:13:37.1344634Z * [new branch] gh/rtimpe/9/orig -> origin/gh/rtimpe/9/orig 2025-09-07T06:13:37.1346745Z * [new branch] gh/ruisizhang123/1/base -> origin/gh/ruisizhang123/1/base 2025-09-07T06:13:37.1347901Z * [new branch] gh/ruisizhang123/1/head -> origin/gh/ruisizhang123/1/head 2025-09-07T06:13:37.1348994Z * [new branch] gh/ruisizhang123/1/orig -> origin/gh/ruisizhang123/1/orig 2025-09-07T06:13:37.1350592Z * [new branch] gh/ruisizhang123/4/base -> origin/gh/ruisizhang123/4/base 2025-09-07T06:13:37.1351779Z * [new branch] gh/ruisizhang123/4/head -> origin/gh/ruisizhang123/4/head 2025-09-07T06:13:37.1352929Z * [new branch] gh/ruisizhang123/4/orig -> origin/gh/ruisizhang123/4/orig 2025-09-07T06:13:37.1354411Z * [new branch] gh/ruisizhang123/5/base -> origin/gh/ruisizhang123/5/base 2025-09-07T06:13:37.1355532Z * [new branch] gh/ruisizhang123/5/head -> origin/gh/ruisizhang123/5/head 2025-09-07T06:13:37.1356680Z * [new branch] gh/ruisizhang123/5/orig -> origin/gh/ruisizhang123/5/orig 2025-09-07T06:13:37.1358145Z * [new branch] gh/ruisizhang123/6/base -> origin/gh/ruisizhang123/6/base 2025-09-07T06:13:37.1359276Z * [new branch] gh/ruisizhang123/6/head -> origin/gh/ruisizhang123/6/head 2025-09-07T06:13:37.1360380Z * [new branch] gh/ruisizhang123/6/orig -> origin/gh/ruisizhang123/6/orig 2025-09-07T06:13:37.1361959Z * [new branch] gh/ruisizhang123/7/base -> origin/gh/ruisizhang123/7/base 2025-09-07T06:13:37.1363131Z * [new branch] gh/ruisizhang123/7/head -> origin/gh/ruisizhang123/7/head 2025-09-07T06:13:37.1364222Z * [new branch] gh/ruisizhang123/7/orig -> origin/gh/ruisizhang123/7/orig 2025-09-07T06:13:37.1365631Z * [new branch] gh/ruisizhang123/8/base -> origin/gh/ruisizhang123/8/base 2025-09-07T06:13:37.1366801Z * [new branch] gh/ruisizhang123/8/head -> origin/gh/ruisizhang123/8/head 2025-09-07T06:13:37.1367926Z * [new branch] gh/ruisizhang123/8/orig -> origin/gh/ruisizhang123/8/orig 2025-09-07T06:13:37.1369398Z * [new branch] gh/ruisizhang123/9/base -> origin/gh/ruisizhang123/9/base 2025-09-07T06:13:37.1370517Z * [new branch] gh/ruisizhang123/9/head -> origin/gh/ruisizhang123/9/head 2025-09-07T06:13:37.1371645Z * [new branch] gh/ruisizhang123/9/orig -> origin/gh/ruisizhang123/9/orig 2025-09-07T06:13:37.1373821Z * [new branch] gh/sarckk/2/base -> origin/gh/sarckk/2/base 2025-09-07T06:13:37.1374991Z * [new branch] gh/sarckk/2/head -> origin/gh/sarckk/2/head 2025-09-07T06:13:37.1376286Z * [new branch] gh/sarckk/2/orig -> origin/gh/sarckk/2/orig 2025-09-07T06:13:37.1378221Z * [new branch] gh/seemethere/35/base -> origin/gh/seemethere/35/base 2025-09-07T06:13:37.1379400Z * [new branch] gh/seemethere/35/head -> origin/gh/seemethere/35/head 2025-09-07T06:13:37.1380587Z * [new branch] gh/seemethere/35/orig -> origin/gh/seemethere/35/orig 2025-09-07T06:13:37.1382243Z * [new branch] gh/seemethere/37/base -> origin/gh/seemethere/37/base 2025-09-07T06:13:37.1383298Z * [new branch] gh/seemethere/37/head -> origin/gh/seemethere/37/head 2025-09-07T06:13:37.1384470Z * [new branch] gh/seemethere/37/orig -> origin/gh/seemethere/37/orig 2025-09-07T06:13:37.1386099Z * [new branch] gh/seemethere/43/base -> origin/gh/seemethere/43/base 2025-09-07T06:13:37.1387202Z * [new branch] gh/seemethere/43/head -> origin/gh/seemethere/43/head 2025-09-07T06:13:37.1388398Z * [new branch] gh/seemethere/43/orig -> origin/gh/seemethere/43/orig 2025-09-07T06:13:37.1389888Z * [new branch] gh/seemethere/44/base -> origin/gh/seemethere/44/base 2025-09-07T06:13:37.1391097Z * [new branch] gh/seemethere/44/head -> origin/gh/seemethere/44/head 2025-09-07T06:13:37.1392511Z * [new branch] gh/seemethere/44/orig -> origin/gh/seemethere/44/orig 2025-09-07T06:13:37.1394275Z * [new branch] gh/seemethere/48/base -> origin/gh/seemethere/48/base 2025-09-07T06:13:37.1395390Z * [new branch] gh/seemethere/48/head -> origin/gh/seemethere/48/head 2025-09-07T06:13:37.1396575Z * [new branch] gh/seemethere/48/orig -> origin/gh/seemethere/48/orig 2025-09-07T06:13:37.1398122Z * [new branch] gh/seemethere/49/base -> origin/gh/seemethere/49/base 2025-09-07T06:13:37.1399278Z * [new branch] gh/seemethere/49/head -> origin/gh/seemethere/49/head 2025-09-07T06:13:37.1400447Z * [new branch] gh/seemethere/49/orig -> origin/gh/seemethere/49/orig 2025-09-07T06:13:37.1401981Z * [new branch] gh/seemethere/52/base -> origin/gh/seemethere/52/base 2025-09-07T06:13:37.1403171Z * [new branch] gh/seemethere/52/head -> origin/gh/seemethere/52/head 2025-09-07T06:13:37.1404468Z * [new branch] gh/seemethere/52/orig -> origin/gh/seemethere/52/orig 2025-09-07T06:13:37.1405944Z * [new branch] gh/seemethere/53/base -> origin/gh/seemethere/53/base 2025-09-07T06:13:37.1407149Z * [new branch] gh/seemethere/53/head -> origin/gh/seemethere/53/head 2025-09-07T06:13:37.1408300Z * [new branch] gh/seemethere/53/orig -> origin/gh/seemethere/53/orig 2025-09-07T06:13:37.1409820Z * [new branch] gh/seemethere/54/base -> origin/gh/seemethere/54/base 2025-09-07T06:13:37.1410945Z * [new branch] gh/seemethere/54/head -> origin/gh/seemethere/54/head 2025-09-07T06:13:37.1412178Z * [new branch] gh/seemethere/54/orig -> origin/gh/seemethere/54/orig 2025-09-07T06:13:37.1414014Z * [new branch] gh/seemethere/55/base -> origin/gh/seemethere/55/base 2025-09-07T06:13:37.1415085Z * [new branch] gh/seemethere/55/head -> origin/gh/seemethere/55/head 2025-09-07T06:13:37.1416453Z * [new branch] gh/seemethere/55/orig -> origin/gh/seemethere/55/orig 2025-09-07T06:13:37.1418002Z * [new branch] gh/seemethere/56/base -> origin/gh/seemethere/56/base 2025-09-07T06:13:37.1419164Z * [new branch] gh/seemethere/56/head -> origin/gh/seemethere/56/head 2025-09-07T06:13:37.1420777Z * [new branch] gh/seemethere/56/orig -> origin/gh/seemethere/56/orig 2025-09-07T06:13:37.1422315Z * [new branch] gh/seemethere/57/base -> origin/gh/seemethere/57/base 2025-09-07T06:13:37.1423614Z * [new branch] gh/seemethere/57/head -> origin/gh/seemethere/57/head 2025-09-07T06:13:37.1424735Z * [new branch] gh/seemethere/57/orig -> origin/gh/seemethere/57/orig 2025-09-07T06:13:37.1426355Z * [new branch] gh/seemethere/58/base -> origin/gh/seemethere/58/base 2025-09-07T06:13:37.1427450Z * [new branch] gh/seemethere/58/head -> origin/gh/seemethere/58/head 2025-09-07T06:13:37.1428650Z * [new branch] gh/seemethere/58/orig -> origin/gh/seemethere/58/orig 2025-09-07T06:13:37.1429958Z * [new branch] gh/seemethere/59/base -> origin/gh/seemethere/59/base 2025-09-07T06:13:37.1431088Z * [new branch] gh/seemethere/59/head -> origin/gh/seemethere/59/head 2025-09-07T06:13:37.1432173Z * [new branch] gh/seemethere/59/orig -> origin/gh/seemethere/59/orig 2025-09-07T06:13:37.1434132Z * [new branch] gh/seemethere/60/base -> origin/gh/seemethere/60/base 2025-09-07T06:13:37.1435258Z * [new branch] gh/seemethere/60/head -> origin/gh/seemethere/60/head 2025-09-07T06:13:37.1436407Z * [new branch] gh/seemethere/60/orig -> origin/gh/seemethere/60/orig 2025-09-07T06:13:37.1437912Z * [new branch] gh/seemethere/61/base -> origin/gh/seemethere/61/base 2025-09-07T06:13:37.1439129Z * [new branch] gh/seemethere/61/head -> origin/gh/seemethere/61/head 2025-09-07T06:13:37.1440277Z * [new branch] gh/seemethere/61/orig -> origin/gh/seemethere/61/orig 2025-09-07T06:13:37.1441797Z * [new branch] gh/seemethere/62/base -> origin/gh/seemethere/62/base 2025-09-07T06:13:37.1442906Z * [new branch] gh/seemethere/62/head -> origin/gh/seemethere/62/head 2025-09-07T06:13:37.1443997Z * [new branch] gh/seemethere/62/orig -> origin/gh/seemethere/62/orig 2025-09-07T06:13:37.1445513Z * [new branch] gh/seemethere/63/base -> origin/gh/seemethere/63/base 2025-09-07T06:13:37.1446585Z * [new branch] gh/seemethere/63/head -> origin/gh/seemethere/63/head 2025-09-07T06:13:37.1447775Z * [new branch] gh/seemethere/63/orig -> origin/gh/seemethere/63/orig 2025-09-07T06:13:37.1449825Z * [new branch] gh/shunting314/145/base -> origin/gh/shunting314/145/base 2025-09-07T06:13:37.1451115Z * [new branch] gh/shunting314/145/head -> origin/gh/shunting314/145/head 2025-09-07T06:13:37.1452299Z * [new branch] gh/shunting314/145/orig -> origin/gh/shunting314/145/orig 2025-09-07T06:13:37.1454679Z * [new branch] gh/shunting314/176/base -> origin/gh/shunting314/176/base 2025-09-07T06:13:37.1455979Z * [new branch] gh/shunting314/176/head -> origin/gh/shunting314/176/head 2025-09-07T06:13:37.1457324Z * [new branch] gh/shunting314/176/orig -> origin/gh/shunting314/176/orig 2025-09-07T06:13:37.1458895Z * [new branch] gh/shunting314/211/base -> origin/gh/shunting314/211/base 2025-09-07T06:13:37.1460055Z * [new branch] gh/shunting314/211/head -> origin/gh/shunting314/211/head 2025-09-07T06:13:37.1461222Z * [new branch] gh/shunting314/211/orig -> origin/gh/shunting314/211/orig 2025-09-07T06:13:37.1462691Z * [new branch] gh/shunting314/212/base -> origin/gh/shunting314/212/base 2025-09-07T06:13:37.1463811Z * [new branch] gh/shunting314/212/head -> origin/gh/shunting314/212/head 2025-09-07T06:13:37.1465091Z * [new branch] gh/shunting314/212/orig -> origin/gh/shunting314/212/orig 2025-09-07T06:13:37.1467031Z * [new branch] gh/shunting314/213/base -> origin/gh/shunting314/213/base 2025-09-07T06:13:37.1468226Z * [new branch] gh/shunting314/213/head -> origin/gh/shunting314/213/head 2025-09-07T06:13:37.1469339Z * [new branch] gh/shunting314/213/orig -> origin/gh/shunting314/213/orig 2025-09-07T06:13:37.1470982Z * [new branch] gh/shunting314/214/base -> origin/gh/shunting314/214/base 2025-09-07T06:13:37.1472081Z * [new branch] gh/shunting314/214/head -> origin/gh/shunting314/214/head 2025-09-07T06:13:37.1473220Z * [new branch] gh/shunting314/214/orig -> origin/gh/shunting314/214/orig 2025-09-07T06:13:37.1474945Z * [new branch] gh/shunting314/215/base -> origin/gh/shunting314/215/base 2025-09-07T06:13:37.1476068Z * [new branch] gh/shunting314/215/head -> origin/gh/shunting314/215/head 2025-09-07T06:13:37.1477173Z * [new branch] gh/shunting314/215/orig -> origin/gh/shunting314/215/orig 2025-09-07T06:13:37.1478669Z * [new branch] gh/shunting314/216/base -> origin/gh/shunting314/216/base 2025-09-07T06:13:37.1479758Z * [new branch] gh/shunting314/216/head -> origin/gh/shunting314/216/head 2025-09-07T06:13:37.1480805Z * [new branch] gh/shunting314/216/orig -> origin/gh/shunting314/216/orig 2025-09-07T06:13:37.1482371Z * [new branch] gh/shunting314/217/base -> origin/gh/shunting314/217/base 2025-09-07T06:13:37.1483551Z * [new branch] gh/shunting314/217/head -> origin/gh/shunting314/217/head 2025-09-07T06:13:37.1484807Z * [new branch] gh/shunting314/217/orig -> origin/gh/shunting314/217/orig 2025-09-07T06:13:37.1486574Z * [new branch] gh/shunting314/218/base -> origin/gh/shunting314/218/base 2025-09-07T06:13:37.1487660Z * [new branch] gh/shunting314/218/head -> origin/gh/shunting314/218/head 2025-09-07T06:13:37.1488846Z * [new branch] gh/shunting314/218/orig -> origin/gh/shunting314/218/orig 2025-09-07T06:13:37.1490252Z * [new branch] gh/shunting314/219/base -> origin/gh/shunting314/219/base 2025-09-07T06:13:37.1491375Z * [new branch] gh/shunting314/219/head -> origin/gh/shunting314/219/head 2025-09-07T06:13:37.1494006Z * [new branch] gh/shunting314/219/orig -> origin/gh/shunting314/219/orig 2025-09-07T06:13:37.1496514Z * [new branch] gh/shunting314/220/base -> origin/gh/shunting314/220/base 2025-09-07T06:13:37.1497955Z * [new branch] gh/shunting314/220/head -> origin/gh/shunting314/220/head 2025-09-07T06:13:37.1499231Z * [new branch] gh/shunting314/220/orig -> origin/gh/shunting314/220/orig 2025-09-07T06:13:37.1501443Z * [new branch] gh/shunting314/221/base -> origin/gh/shunting314/221/base 2025-09-07T06:13:37.1502587Z * [new branch] gh/shunting314/221/head -> origin/gh/shunting314/221/head 2025-09-07T06:13:37.1503758Z * [new branch] gh/shunting314/221/orig -> origin/gh/shunting314/221/orig 2025-09-07T06:13:37.1505397Z * [new branch] gh/shunting314/222/base -> origin/gh/shunting314/222/base 2025-09-07T06:13:37.1506467Z * [new branch] gh/shunting314/222/head -> origin/gh/shunting314/222/head 2025-09-07T06:13:37.1507615Z * [new branch] gh/shunting314/222/orig -> origin/gh/shunting314/222/orig 2025-09-07T06:13:37.1509459Z * [new branch] gh/shunting314/223/base -> origin/gh/shunting314/223/base 2025-09-07T06:13:37.1510569Z * [new branch] gh/shunting314/223/head -> origin/gh/shunting314/223/head 2025-09-07T06:13:37.1511713Z * [new branch] gh/shunting314/223/orig -> origin/gh/shunting314/223/orig 2025-09-07T06:13:37.1513658Z * [new branch] gh/silverguo/1/base -> origin/gh/silverguo/1/base 2025-09-07T06:13:37.1514800Z * [new branch] gh/silverguo/1/head -> origin/gh/silverguo/1/head 2025-09-07T06:13:37.1516261Z * [new branch] gh/silverguo/2/base -> origin/gh/silverguo/2/base 2025-09-07T06:13:37.1517325Z * [new branch] gh/silverguo/2/head -> origin/gh/silverguo/2/head 2025-09-07T06:13:37.1518702Z * [new branch] gh/silverguo/3/base -> origin/gh/silverguo/3/base 2025-09-07T06:13:37.1519841Z * [new branch] gh/silverguo/3/head -> origin/gh/silverguo/3/head 2025-09-07T06:13:37.1521198Z * [new branch] gh/silverguo/4/base -> origin/gh/silverguo/4/base 2025-09-07T06:13:37.1522377Z * [new branch] gh/silverguo/4/head -> origin/gh/silverguo/4/head 2025-09-07T06:13:37.1524301Z * [new branch] gh/sinhaanhsul/1/base -> origin/gh/sinhaanhsul/1/base 2025-09-07T06:13:37.1525349Z * [new branch] gh/sinhaanhsul/1/head -> origin/gh/sinhaanhsul/1/head 2025-09-07T06:13:37.1527181Z * [new branch] gh/skarjala/17/base -> origin/gh/skarjala/17/base 2025-09-07T06:13:37.1528322Z * [new branch] gh/skarjala/17/head -> origin/gh/skarjala/17/head 2025-09-07T06:13:37.1529553Z * [new branch] gh/skarjala/17/orig -> origin/gh/skarjala/17/orig 2025-09-07T06:13:37.1531117Z * [new branch] gh/skarjala/18/base -> origin/gh/skarjala/18/base 2025-09-07T06:13:37.1532269Z * [new branch] gh/skarjala/18/head -> origin/gh/skarjala/18/head 2025-09-07T06:13:37.1533694Z * [new branch] gh/skarjala/18/orig -> origin/gh/skarjala/18/orig 2025-09-07T06:13:37.1535470Z * [new branch] gh/skarjala/19/base -> origin/gh/skarjala/19/base 2025-09-07T06:13:37.1536648Z * [new branch] gh/skarjala/19/head -> origin/gh/skarjala/19/head 2025-09-07T06:13:37.1537797Z * [new branch] gh/skarjala/19/orig -> origin/gh/skarjala/19/orig 2025-09-07T06:13:37.1539750Z * [new branch] gh/slayton58/1/base -> origin/gh/slayton58/1/base 2025-09-07T06:13:37.1540924Z * [new branch] gh/slayton58/1/head -> origin/gh/slayton58/1/head 2025-09-07T06:13:37.1542099Z * [new branch] gh/slayton58/1/orig -> origin/gh/slayton58/1/orig 2025-09-07T06:13:37.1543606Z * [new branch] gh/slayton58/2/base -> origin/gh/slayton58/2/base 2025-09-07T06:13:37.1544892Z * [new branch] gh/slayton58/2/head -> origin/gh/slayton58/2/head 2025-09-07T06:13:37.1546154Z * [new branch] gh/slayton58/2/orig -> origin/gh/slayton58/2/orig 2025-09-07T06:13:37.1547622Z * [new branch] gh/slayton58/3/base -> origin/gh/slayton58/3/base 2025-09-07T06:13:37.1548695Z * [new branch] gh/slayton58/3/head -> origin/gh/slayton58/3/head 2025-09-07T06:13:37.1549842Z * [new branch] gh/slayton58/3/orig -> origin/gh/slayton58/3/orig 2025-09-07T06:13:37.1551313Z * [new branch] gh/slayton58/4/base -> origin/gh/slayton58/4/base 2025-09-07T06:13:37.1552365Z * [new branch] gh/slayton58/4/head -> origin/gh/slayton58/4/head 2025-09-07T06:13:37.1553531Z * [new branch] gh/slayton58/4/orig -> origin/gh/slayton58/4/orig 2025-09-07T06:13:37.1554930Z * [new branch] gh/slayton58/5/base -> origin/gh/slayton58/5/base 2025-09-07T06:13:37.1556057Z * [new branch] gh/slayton58/5/head -> origin/gh/slayton58/5/head 2025-09-07T06:13:37.1557184Z * [new branch] gh/slayton58/5/orig -> origin/gh/slayton58/5/orig 2025-09-07T06:13:37.1559399Z * [new branch] gh/soulitzer/269/base -> origin/gh/soulitzer/269/base 2025-09-07T06:13:37.1560466Z * [new branch] gh/soulitzer/269/head -> origin/gh/soulitzer/269/head 2025-09-07T06:13:37.1561632Z * [new branch] gh/soulitzer/269/orig -> origin/gh/soulitzer/269/orig 2025-09-07T06:13:37.1563278Z * [new branch] gh/soulitzer/276/base -> origin/gh/soulitzer/276/base 2025-09-07T06:13:37.1564438Z * [new branch] gh/soulitzer/276/head -> origin/gh/soulitzer/276/head 2025-09-07T06:13:37.1565566Z * [new branch] gh/soulitzer/276/orig -> origin/gh/soulitzer/276/orig 2025-09-07T06:13:37.1567366Z * [new branch] gh/soulitzer/287/base -> origin/gh/soulitzer/287/base 2025-09-07T06:13:37.1568463Z * [new branch] gh/soulitzer/287/head -> origin/gh/soulitzer/287/head 2025-09-07T06:13:37.1569668Z * [new branch] gh/soulitzer/287/orig -> origin/gh/soulitzer/287/orig 2025-09-07T06:13:37.1571444Z * [new branch] gh/soulitzer/296/base -> origin/gh/soulitzer/296/base 2025-09-07T06:13:37.1572474Z * [new branch] gh/soulitzer/296/head -> origin/gh/soulitzer/296/head 2025-09-07T06:13:37.1574074Z * [new branch] gh/soulitzer/296/orig -> origin/gh/soulitzer/296/orig 2025-09-07T06:13:37.1575674Z * [new branch] gh/soulitzer/299/base -> origin/gh/soulitzer/299/base 2025-09-07T06:13:37.1576895Z * [new branch] gh/soulitzer/299/head -> origin/gh/soulitzer/299/head 2025-09-07T06:13:37.1578123Z * [new branch] gh/soulitzer/299/orig -> origin/gh/soulitzer/299/orig 2025-09-07T06:13:37.1579695Z * [new branch] gh/soulitzer/300/base -> origin/gh/soulitzer/300/base 2025-09-07T06:13:37.1580942Z * [new branch] gh/soulitzer/300/head -> origin/gh/soulitzer/300/head 2025-09-07T06:13:37.1582099Z * [new branch] gh/soulitzer/300/orig -> origin/gh/soulitzer/300/orig 2025-09-07T06:13:37.1583854Z * [new branch] gh/soulitzer/301/base -> origin/gh/soulitzer/301/base 2025-09-07T06:13:37.1585157Z * [new branch] gh/soulitzer/301/head -> origin/gh/soulitzer/301/head 2025-09-07T06:13:37.1586298Z * [new branch] gh/soulitzer/301/orig -> origin/gh/soulitzer/301/orig 2025-09-07T06:13:37.1587801Z * [new branch] gh/soulitzer/313/base -> origin/gh/soulitzer/313/base 2025-09-07T06:13:37.1588934Z * [new branch] gh/soulitzer/313/head -> origin/gh/soulitzer/313/head 2025-09-07T06:13:37.1590051Z * [new branch] gh/soulitzer/313/orig -> origin/gh/soulitzer/313/orig 2025-09-07T06:13:37.1591721Z * [new branch] gh/soulitzer/319/base -> origin/gh/soulitzer/319/base 2025-09-07T06:13:37.1593351Z * [new branch] gh/soulitzer/319/head -> origin/gh/soulitzer/319/head 2025-09-07T06:13:37.1594514Z * [new branch] gh/soulitzer/319/orig -> origin/gh/soulitzer/319/orig 2025-09-07T06:13:37.1596216Z * [new branch] gh/soulitzer/320/base -> origin/gh/soulitzer/320/base 2025-09-07T06:13:37.1597311Z * [new branch] gh/soulitzer/320/head -> origin/gh/soulitzer/320/head 2025-09-07T06:13:37.1598467Z * [new branch] gh/soulitzer/320/orig -> origin/gh/soulitzer/320/orig 2025-09-07T06:13:37.1600164Z * [new branch] gh/soulitzer/336/base -> origin/gh/soulitzer/336/base 2025-09-07T06:13:37.1601269Z * [new branch] gh/soulitzer/336/head -> origin/gh/soulitzer/336/head 2025-09-07T06:13:37.1602379Z * [new branch] gh/soulitzer/336/orig -> origin/gh/soulitzer/336/orig 2025-09-07T06:13:37.1604240Z * [new branch] gh/soulitzer/347/base -> origin/gh/soulitzer/347/base 2025-09-07T06:13:37.1605435Z * [new branch] gh/soulitzer/347/head -> origin/gh/soulitzer/347/head 2025-09-07T06:13:37.1606512Z * [new branch] gh/soulitzer/347/orig -> origin/gh/soulitzer/347/orig 2025-09-07T06:13:37.1608329Z * [new branch] gh/soulitzer/349/base -> origin/gh/soulitzer/349/base 2025-09-07T06:13:37.1609452Z * [new branch] gh/soulitzer/349/head -> origin/gh/soulitzer/349/head 2025-09-07T06:13:37.1610654Z * [new branch] gh/soulitzer/349/orig -> origin/gh/soulitzer/349/orig 2025-09-07T06:13:37.1612210Z * [new branch] gh/soulitzer/350/base -> origin/gh/soulitzer/350/base 2025-09-07T06:13:37.1613439Z * [new branch] gh/soulitzer/350/head -> origin/gh/soulitzer/350/head 2025-09-07T06:13:37.1614628Z * [new branch] gh/soulitzer/350/orig -> origin/gh/soulitzer/350/orig 2025-09-07T06:13:37.1616217Z * [new branch] gh/soulitzer/351/base -> origin/gh/soulitzer/351/base 2025-09-07T06:13:37.1617363Z * [new branch] gh/soulitzer/351/head -> origin/gh/soulitzer/351/head 2025-09-07T06:13:37.1618590Z * [new branch] gh/soulitzer/351/orig -> origin/gh/soulitzer/351/orig 2025-09-07T06:13:37.1620058Z * [new branch] gh/soulitzer/353/base -> origin/gh/soulitzer/353/base 2025-09-07T06:13:37.1621527Z * [new branch] gh/soulitzer/353/head -> origin/gh/soulitzer/353/head 2025-09-07T06:13:37.1622589Z * [new branch] gh/soulitzer/353/orig -> origin/gh/soulitzer/353/orig 2025-09-07T06:13:37.1624841Z * [new branch] gh/soulitzer/358/base -> origin/gh/soulitzer/358/base 2025-09-07T06:13:37.1626229Z * [new branch] gh/soulitzer/358/head -> origin/gh/soulitzer/358/head 2025-09-07T06:13:37.1627323Z * [new branch] gh/soulitzer/358/orig -> origin/gh/soulitzer/358/orig 2025-09-07T06:13:37.1629389Z * [new branch] gh/soulitzer/359/base -> origin/gh/soulitzer/359/base 2025-09-07T06:13:37.1630586Z * [new branch] gh/soulitzer/359/head -> origin/gh/soulitzer/359/head 2025-09-07T06:13:37.1631738Z * [new branch] gh/soulitzer/359/orig -> origin/gh/soulitzer/359/orig 2025-09-07T06:13:37.1633351Z * [new branch] gh/soulitzer/362/base -> origin/gh/soulitzer/362/base 2025-09-07T06:13:37.1634479Z * [new branch] gh/soulitzer/362/head -> origin/gh/soulitzer/362/head 2025-09-07T06:13:37.1635574Z * [new branch] gh/soulitzer/362/orig -> origin/gh/soulitzer/362/orig 2025-09-07T06:13:37.1637120Z * [new branch] gh/soulitzer/372/base -> origin/gh/soulitzer/372/base 2025-09-07T06:13:37.1638273Z * [new branch] gh/soulitzer/372/head -> origin/gh/soulitzer/372/head 2025-09-07T06:13:37.1639386Z * [new branch] gh/soulitzer/372/orig -> origin/gh/soulitzer/372/orig 2025-09-07T06:13:37.1641055Z * [new branch] gh/soulitzer/373/base -> origin/gh/soulitzer/373/base 2025-09-07T06:13:37.1642122Z * [new branch] gh/soulitzer/373/head -> origin/gh/soulitzer/373/head 2025-09-07T06:13:37.1643229Z * [new branch] gh/soulitzer/373/orig -> origin/gh/soulitzer/373/orig 2025-09-07T06:13:37.1644820Z * [new branch] gh/soulitzer/374/base -> origin/gh/soulitzer/374/base 2025-09-07T06:13:37.1645982Z * [new branch] gh/soulitzer/374/head -> origin/gh/soulitzer/374/head 2025-09-07T06:13:37.1647129Z * [new branch] gh/soulitzer/374/orig -> origin/gh/soulitzer/374/orig 2025-09-07T06:13:37.1648594Z * [new branch] gh/soulitzer/375/base -> origin/gh/soulitzer/375/base 2025-09-07T06:13:37.1649643Z * [new branch] gh/soulitzer/375/head -> origin/gh/soulitzer/375/head 2025-09-07T06:13:37.1650881Z * [new branch] gh/soulitzer/375/orig -> origin/gh/soulitzer/375/orig 2025-09-07T06:13:37.1652419Z * [new branch] gh/soulitzer/376/base -> origin/gh/soulitzer/376/base 2025-09-07T06:13:37.1653826Z * [new branch] gh/soulitzer/376/head -> origin/gh/soulitzer/376/head 2025-09-07T06:13:37.1654979Z * [new branch] gh/soulitzer/376/orig -> origin/gh/soulitzer/376/orig 2025-09-07T06:13:37.1656671Z * [new branch] gh/soulitzer/377/base -> origin/gh/soulitzer/377/base 2025-09-07T06:13:37.1657753Z * [new branch] gh/soulitzer/377/head -> origin/gh/soulitzer/377/head 2025-09-07T06:13:37.1658893Z * [new branch] gh/soulitzer/377/orig -> origin/gh/soulitzer/377/orig 2025-09-07T06:13:37.1660717Z * [new branch] gh/soulitzer/378/base -> origin/gh/soulitzer/378/base 2025-09-07T06:13:37.1661822Z * [new branch] gh/soulitzer/378/head -> origin/gh/soulitzer/378/head 2025-09-07T06:13:37.1662989Z * [new branch] gh/soulitzer/378/orig -> origin/gh/soulitzer/378/orig 2025-09-07T06:13:37.1664752Z * [new branch] gh/soulitzer/379/base -> origin/gh/soulitzer/379/base 2025-09-07T06:13:37.1665958Z * [new branch] gh/soulitzer/379/head -> origin/gh/soulitzer/379/head 2025-09-07T06:13:37.1667003Z * [new branch] gh/soulitzer/379/orig -> origin/gh/soulitzer/379/orig 2025-09-07T06:13:37.1668918Z * [new branch] gh/swolchok/728/next -> origin/gh/swolchok/728/next 2025-09-07T06:13:37.1670860Z * [new branch] gh/swolchok/767/base -> origin/gh/swolchok/767/base 2025-09-07T06:13:37.1672413Z * [new branch] gh/swolchok/767/head -> origin/gh/swolchok/767/head 2025-09-07T06:13:37.1673834Z * [new branch] gh/swolchok/767/orig -> origin/gh/swolchok/767/orig 2025-09-07T06:13:37.1675537Z * [new branch] gh/swolchok/768/base -> origin/gh/swolchok/768/base 2025-09-07T06:13:37.1676676Z * [new branch] gh/swolchok/768/head -> origin/gh/swolchok/768/head 2025-09-07T06:13:37.1677953Z * [new branch] gh/swolchok/768/orig -> origin/gh/swolchok/768/orig 2025-09-07T06:13:37.1679806Z * [new branch] gh/swolchok/769/base -> origin/gh/swolchok/769/base 2025-09-07T06:13:37.1680935Z * [new branch] gh/swolchok/769/head -> origin/gh/swolchok/769/head 2025-09-07T06:13:37.1682402Z * [new branch] gh/swolchok/769/orig -> origin/gh/swolchok/769/orig 2025-09-07T06:13:37.1683869Z * [new branch] gh/swolchok/771/base -> origin/gh/swolchok/771/base 2025-09-07T06:13:37.1685088Z * [new branch] gh/swolchok/771/head -> origin/gh/swolchok/771/head 2025-09-07T06:13:37.1686361Z * [new branch] gh/swolchok/771/orig -> origin/gh/swolchok/771/orig 2025-09-07T06:13:37.1687810Z * [new branch] gh/swolchok/772/base -> origin/gh/swolchok/772/base 2025-09-07T06:13:37.1689066Z * [new branch] gh/swolchok/772/head -> origin/gh/swolchok/772/head 2025-09-07T06:13:37.1690345Z * [new branch] gh/swolchok/772/orig -> origin/gh/swolchok/772/orig 2025-09-07T06:13:37.1692052Z * [new branch] gh/swolchok/773/base -> origin/gh/swolchok/773/base 2025-09-07T06:13:37.1693901Z * [new branch] gh/swolchok/773/head -> origin/gh/swolchok/773/head 2025-09-07T06:13:37.1695170Z * [new branch] gh/swolchok/773/orig -> origin/gh/swolchok/773/orig 2025-09-07T06:13:37.1696611Z * [new branch] gh/swolchok/786/base -> origin/gh/swolchok/786/base 2025-09-07T06:13:37.1697716Z * [new branch] gh/swolchok/786/head -> origin/gh/swolchok/786/head 2025-09-07T06:13:37.1698874Z * [new branch] gh/swolchok/786/orig -> origin/gh/swolchok/786/orig 2025-09-07T06:13:37.1700298Z * [new branch] gh/swolchok/787/base -> origin/gh/swolchok/787/base 2025-09-07T06:13:37.1701607Z * [new branch] gh/swolchok/787/head -> origin/gh/swolchok/787/head 2025-09-07T06:13:37.1702688Z * [new branch] gh/swolchok/787/orig -> origin/gh/swolchok/787/orig 2025-09-07T06:13:37.1704256Z * [new branch] gh/swolchok/788/base -> origin/gh/swolchok/788/base 2025-09-07T06:13:37.1705627Z * [new branch] gh/swolchok/788/head -> origin/gh/swolchok/788/head 2025-09-07T06:13:37.1706711Z * [new branch] gh/swolchok/788/orig -> origin/gh/swolchok/788/orig 2025-09-07T06:13:37.1708167Z * [new branch] gh/swolchok/789/base -> origin/gh/swolchok/789/base 2025-09-07T06:13:37.1709400Z * [new branch] gh/swolchok/789/head -> origin/gh/swolchok/789/head 2025-09-07T06:13:37.1710544Z * [new branch] gh/swolchok/789/orig -> origin/gh/swolchok/789/orig 2025-09-07T06:13:37.1712043Z * [new branch] gh/swolchok/790/base -> origin/gh/swolchok/790/base 2025-09-07T06:13:37.1713290Z * [new branch] gh/swolchok/790/head -> origin/gh/swolchok/790/head 2025-09-07T06:13:37.1714287Z * [new branch] gh/swolchok/790/orig -> origin/gh/swolchok/790/orig 2025-09-07T06:13:37.1715945Z * [new branch] gh/swolchok/791/base -> origin/gh/swolchok/791/base 2025-09-07T06:13:37.1716968Z * [new branch] gh/swolchok/791/head -> origin/gh/swolchok/791/head 2025-09-07T06:13:37.1718054Z * [new branch] gh/swolchok/791/orig -> origin/gh/swolchok/791/orig 2025-09-07T06:13:37.1719768Z * [new branch] gh/swolchok/792/base -> origin/gh/swolchok/792/base 2025-09-07T06:13:37.1720886Z * [new branch] gh/swolchok/792/head -> origin/gh/swolchok/792/head 2025-09-07T06:13:37.1722005Z * [new branch] gh/swolchok/792/orig -> origin/gh/swolchok/792/orig 2025-09-07T06:13:37.1723655Z * [new branch] gh/swolchok/793/base -> origin/gh/swolchok/793/base 2025-09-07T06:13:37.1724734Z * [new branch] gh/swolchok/793/head -> origin/gh/swolchok/793/head 2025-09-07T06:13:37.1725845Z * [new branch] gh/swolchok/793/orig -> origin/gh/swolchok/793/orig 2025-09-07T06:13:37.1727475Z * [new branch] gh/swolchok/794/base -> origin/gh/swolchok/794/base 2025-09-07T06:13:37.1728539Z * [new branch] gh/swolchok/794/head -> origin/gh/swolchok/794/head 2025-09-07T06:13:37.1729574Z * [new branch] gh/swolchok/794/orig -> origin/gh/swolchok/794/orig 2025-09-07T06:13:37.1731878Z * [new branch] gh/swolchok/795/base -> origin/gh/swolchok/795/base 2025-09-07T06:13:37.1732958Z * [new branch] gh/swolchok/795/head -> origin/gh/swolchok/795/head 2025-09-07T06:13:37.1734427Z * [new branch] gh/swolchok/795/orig -> origin/gh/swolchok/795/orig 2025-09-07T06:13:37.1736044Z * [new branch] gh/swolchok/796/base -> origin/gh/swolchok/796/base 2025-09-07T06:13:37.1737343Z * [new branch] gh/swolchok/796/head -> origin/gh/swolchok/796/head 2025-09-07T06:13:37.1738662Z * [new branch] gh/swolchok/796/orig -> origin/gh/swolchok/796/orig 2025-09-07T06:13:37.1740354Z * [new branch] gh/swolchok/797/base -> origin/gh/swolchok/797/base 2025-09-07T06:13:37.1741729Z * [new branch] gh/swolchok/797/head -> origin/gh/swolchok/797/head 2025-09-07T06:13:37.1743004Z * [new branch] gh/swolchok/797/orig -> origin/gh/swolchok/797/orig 2025-09-07T06:13:37.1744688Z * [new branch] gh/swolchok/798/base -> origin/gh/swolchok/798/base 2025-09-07T06:13:37.1745864Z * [new branch] gh/swolchok/798/head -> origin/gh/swolchok/798/head 2025-09-07T06:13:37.1747169Z * [new branch] gh/swolchok/798/orig -> origin/gh/swolchok/798/orig 2025-09-07T06:13:37.1748846Z * [new branch] gh/swolchok/799/base -> origin/gh/swolchok/799/base 2025-09-07T06:13:37.1749949Z * [new branch] gh/swolchok/799/head -> origin/gh/swolchok/799/head 2025-09-07T06:13:37.1751174Z * [new branch] gh/swolchok/799/orig -> origin/gh/swolchok/799/orig 2025-09-07T06:13:37.1752901Z * [new branch] gh/swolchok/800/base -> origin/gh/swolchok/800/base 2025-09-07T06:13:37.1754101Z * [new branch] gh/swolchok/800/head -> origin/gh/swolchok/800/head 2025-09-07T06:13:37.1755356Z * [new branch] gh/swolchok/800/orig -> origin/gh/swolchok/800/orig 2025-09-07T06:13:37.1757093Z * [new branch] gh/swolchok/801/base -> origin/gh/swolchok/801/base 2025-09-07T06:13:37.1758166Z * [new branch] gh/swolchok/801/head -> origin/gh/swolchok/801/head 2025-09-07T06:13:37.1759313Z * [new branch] gh/swolchok/801/orig -> origin/gh/swolchok/801/orig 2025-09-07T06:13:37.1761011Z * [new branch] gh/swolchok/802/base -> origin/gh/swolchok/802/base 2025-09-07T06:13:37.1762043Z * [new branch] gh/swolchok/802/head -> origin/gh/swolchok/802/head 2025-09-07T06:13:37.1763199Z * [new branch] gh/swolchok/802/orig -> origin/gh/swolchok/802/orig 2025-09-07T06:13:37.1764748Z * [new branch] gh/swolchok/803/base -> origin/gh/swolchok/803/base 2025-09-07T06:13:37.1765807Z * [new branch] gh/swolchok/803/head -> origin/gh/swolchok/803/head 2025-09-07T06:13:37.1767057Z * [new branch] gh/swolchok/803/orig -> origin/gh/swolchok/803/orig 2025-09-07T06:13:37.1768805Z * [new branch] gh/swolchok/804/base -> origin/gh/swolchok/804/base 2025-09-07T06:13:37.1769917Z * [new branch] gh/swolchok/804/head -> origin/gh/swolchok/804/head 2025-09-07T06:13:37.1771244Z * [new branch] gh/swolchok/804/orig -> origin/gh/swolchok/804/orig 2025-09-07T06:13:37.1773187Z * [new branch] gh/swolchok/805/base -> origin/gh/swolchok/805/base 2025-09-07T06:13:37.1774243Z * [new branch] gh/swolchok/805/head -> origin/gh/swolchok/805/head 2025-09-07T06:13:37.1775433Z * [new branch] gh/swolchok/805/orig -> origin/gh/swolchok/805/orig 2025-09-07T06:13:37.1776894Z * [new branch] gh/swolchok/806/base -> origin/gh/swolchok/806/base 2025-09-07T06:13:37.1778132Z * [new branch] gh/swolchok/806/head -> origin/gh/swolchok/806/head 2025-09-07T06:13:37.1779324Z * [new branch] gh/swolchok/806/orig -> origin/gh/swolchok/806/orig 2025-09-07T06:13:37.1781021Z * [new branch] gh/swolchok/807/base -> origin/gh/swolchok/807/base 2025-09-07T06:13:37.1782123Z * [new branch] gh/swolchok/807/head -> origin/gh/swolchok/807/head 2025-09-07T06:13:37.1783408Z * [new branch] gh/swolchok/807/orig -> origin/gh/swolchok/807/orig 2025-09-07T06:13:37.1785263Z * [new branch] gh/swolchok/808/base -> origin/gh/swolchok/808/base 2025-09-07T06:13:37.1786505Z * [new branch] gh/swolchok/808/head -> origin/gh/swolchok/808/head 2025-09-07T06:13:37.1787614Z * [new branch] gh/swolchok/808/orig -> origin/gh/swolchok/808/orig 2025-09-07T06:13:37.1789128Z * [new branch] gh/swolchok/809/base -> origin/gh/swolchok/809/base 2025-09-07T06:13:37.1790314Z * [new branch] gh/swolchok/809/head -> origin/gh/swolchok/809/head 2025-09-07T06:13:37.1791481Z * [new branch] gh/swolchok/809/orig -> origin/gh/swolchok/809/orig 2025-09-07T06:13:37.1793519Z * [new branch] gh/swolchok/810/base -> origin/gh/swolchok/810/base 2025-09-07T06:13:37.1794759Z * [new branch] gh/swolchok/810/head -> origin/gh/swolchok/810/head 2025-09-07T06:13:37.1795951Z * [new branch] gh/swolchok/810/orig -> origin/gh/swolchok/810/orig 2025-09-07T06:13:37.1797562Z * [new branch] gh/swolchok/811/base -> origin/gh/swolchok/811/base 2025-09-07T06:13:37.1798787Z * [new branch] gh/swolchok/811/head -> origin/gh/swolchok/811/head 2025-09-07T06:13:37.1800077Z * [new branch] gh/swolchok/811/orig -> origin/gh/swolchok/811/orig 2025-09-07T06:13:37.1801757Z * [new branch] gh/swolchok/812/base -> origin/gh/swolchok/812/base 2025-09-07T06:13:37.1803081Z * [new branch] gh/swolchok/812/head -> origin/gh/swolchok/812/head 2025-09-07T06:13:37.1804157Z * [new branch] gh/swolchok/812/orig -> origin/gh/swolchok/812/orig 2025-09-07T06:13:37.1805907Z * [new branch] gh/swolchok/813/base -> origin/gh/swolchok/813/base 2025-09-07T06:13:37.1807023Z * [new branch] gh/swolchok/813/head -> origin/gh/swolchok/813/head 2025-09-07T06:13:37.1808252Z * [new branch] gh/swolchok/813/orig -> origin/gh/swolchok/813/orig 2025-09-07T06:13:37.1810405Z * [new branch] gh/swolchok/814/base -> origin/gh/swolchok/814/base 2025-09-07T06:13:37.1811830Z * [new branch] gh/swolchok/814/head -> origin/gh/swolchok/814/head 2025-09-07T06:13:37.1812906Z * [new branch] gh/swolchok/814/orig -> origin/gh/swolchok/814/orig 2025-09-07T06:13:37.1814936Z * [new branch] gh/swolchok/815/base -> origin/gh/swolchok/815/base 2025-09-07T06:13:37.1815949Z * [new branch] gh/swolchok/815/head -> origin/gh/swolchok/815/head 2025-09-07T06:13:37.1817101Z * [new branch] gh/swolchok/815/orig -> origin/gh/swolchok/815/orig 2025-09-07T06:13:37.1818938Z * [new branch] gh/swolchok/816/base -> origin/gh/swolchok/816/base 2025-09-07T06:13:37.1820183Z * [new branch] gh/swolchok/816/head -> origin/gh/swolchok/816/head 2025-09-07T06:13:37.1821373Z * [new branch] gh/swolchok/816/orig -> origin/gh/swolchok/816/orig 2025-09-07T06:13:37.1823159Z * [new branch] gh/swolchok/817/base -> origin/gh/swolchok/817/base 2025-09-07T06:13:37.1824147Z * [new branch] gh/swolchok/817/head -> origin/gh/swolchok/817/head 2025-09-07T06:13:37.1825391Z * [new branch] gh/swolchok/817/orig -> origin/gh/swolchok/817/orig 2025-09-07T06:13:37.1827208Z * [new branch] gh/swolchok/818/base -> origin/gh/swolchok/818/base 2025-09-07T06:13:37.1828208Z * [new branch] gh/swolchok/818/head -> origin/gh/swolchok/818/head 2025-09-07T06:13:37.1829343Z * [new branch] gh/swolchok/818/orig -> origin/gh/swolchok/818/orig 2025-09-07T06:13:37.1831158Z * [new branch] gh/swolchok/819/base -> origin/gh/swolchok/819/base 2025-09-07T06:13:37.1832198Z * [new branch] gh/swolchok/819/head -> origin/gh/swolchok/819/head 2025-09-07T06:13:37.1833384Z * [new branch] gh/swolchok/819/orig -> origin/gh/swolchok/819/orig 2025-09-07T06:13:37.1834949Z * [new branch] gh/swolchok/820/base -> origin/gh/swolchok/820/base 2025-09-07T06:13:37.1836092Z * [new branch] gh/swolchok/820/head -> origin/gh/swolchok/820/head 2025-09-07T06:13:37.1837298Z * [new branch] gh/swolchok/820/orig -> origin/gh/swolchok/820/orig 2025-09-07T06:13:37.1838894Z * [new branch] gh/swolchok/821/base -> origin/gh/swolchok/821/base 2025-09-07T06:13:37.1839939Z * [new branch] gh/swolchok/821/head -> origin/gh/swolchok/821/head 2025-09-07T06:13:37.1841077Z * [new branch] gh/swolchok/821/orig -> origin/gh/swolchok/821/orig 2025-09-07T06:13:37.1842790Z * [new branch] gh/swolchok/822/base -> origin/gh/swolchok/822/base 2025-09-07T06:13:37.1843801Z * [new branch] gh/swolchok/822/head -> origin/gh/swolchok/822/head 2025-09-07T06:13:37.1844942Z * [new branch] gh/swolchok/822/orig -> origin/gh/swolchok/822/orig 2025-09-07T06:13:37.1846578Z * [new branch] gh/swolchok/823/base -> origin/gh/swolchok/823/base 2025-09-07T06:13:37.1847598Z * [new branch] gh/swolchok/823/head -> origin/gh/swolchok/823/head 2025-09-07T06:13:37.1848724Z * [new branch] gh/swolchok/823/orig -> origin/gh/swolchok/823/orig 2025-09-07T06:13:37.1850215Z * [new branch] gh/swolchok/824/base -> origin/gh/swolchok/824/base 2025-09-07T06:13:37.1851433Z * [new branch] gh/swolchok/824/head -> origin/gh/swolchok/824/head 2025-09-07T06:13:37.1852522Z * [new branch] gh/swolchok/824/orig -> origin/gh/swolchok/824/orig 2025-09-07T06:13:37.1854485Z * [new branch] gh/swolchok/825/base -> origin/gh/swolchok/825/base 2025-09-07T06:13:37.1855781Z * [new branch] gh/swolchok/825/head -> origin/gh/swolchok/825/head 2025-09-07T06:13:37.1862879Z * [new branch] gh/swolchok/825/orig -> origin/gh/swolchok/825/orig 2025-09-07T06:13:37.1863248Z * [new branch] gh/swolchok/826/base -> origin/gh/swolchok/826/base 2025-09-07T06:13:37.1863519Z * [new branch] gh/swolchok/826/head -> origin/gh/swolchok/826/head 2025-09-07T06:13:37.1863766Z * [new branch] gh/swolchok/826/orig -> origin/gh/swolchok/826/orig 2025-09-07T06:13:37.1864026Z * [new branch] gh/swolchok/827/base -> origin/gh/swolchok/827/base 2025-09-07T06:13:37.1864269Z * [new branch] gh/swolchok/827/head -> origin/gh/swolchok/827/head 2025-09-07T06:13:37.1864752Z * [new branch] gh/swolchok/827/orig -> origin/gh/swolchok/827/orig 2025-09-07T06:13:37.1866584Z * [new branch] gh/swolchok/828/base -> origin/gh/swolchok/828/base 2025-09-07T06:13:37.1867719Z * [new branch] gh/swolchok/828/head -> origin/gh/swolchok/828/head 2025-09-07T06:13:37.1868808Z * [new branch] gh/swolchok/828/orig -> origin/gh/swolchok/828/orig 2025-09-07T06:13:37.1870209Z * [new branch] gh/swolchok/829/base -> origin/gh/swolchok/829/base 2025-09-07T06:13:37.1871365Z * [new branch] gh/swolchok/829/head -> origin/gh/swolchok/829/head 2025-09-07T06:13:37.1872516Z * [new branch] gh/swolchok/829/orig -> origin/gh/swolchok/829/orig 2025-09-07T06:13:37.1874217Z * [new branch] gh/swolchok/830/base -> origin/gh/swolchok/830/base 2025-09-07T06:13:37.1875289Z * [new branch] gh/swolchok/830/head -> origin/gh/swolchok/830/head 2025-09-07T06:13:37.1876343Z * [new branch] gh/swolchok/830/orig -> origin/gh/swolchok/830/orig 2025-09-07T06:13:37.1877748Z * [new branch] gh/swolchok/831/base -> origin/gh/swolchok/831/base 2025-09-07T06:13:37.1879092Z * [new branch] gh/swolchok/831/head -> origin/gh/swolchok/831/head 2025-09-07T06:13:37.1880198Z * [new branch] gh/swolchok/831/orig -> origin/gh/swolchok/831/orig 2025-09-07T06:13:37.1881644Z * [new branch] gh/swolchok/832/base -> origin/gh/swolchok/832/base 2025-09-07T06:13:37.1882912Z * [new branch] gh/swolchok/832/head -> origin/gh/swolchok/832/head 2025-09-07T06:13:37.1883957Z * [new branch] gh/swolchok/832/orig -> origin/gh/swolchok/832/orig 2025-09-07T06:13:37.1885763Z * [new branch] gh/syed-ahmed/3/base -> origin/gh/syed-ahmed/3/base 2025-09-07T06:13:37.1886893Z * [new branch] gh/syed-ahmed/3/head -> origin/gh/syed-ahmed/3/head 2025-09-07T06:13:37.1887982Z * [new branch] gh/syed-ahmed/3/orig -> origin/gh/syed-ahmed/3/orig 2025-09-07T06:13:37.1889488Z * [new branch] gh/syed-ahmed/4/base -> origin/gh/syed-ahmed/4/base 2025-09-07T06:13:37.1890623Z * [new branch] gh/syed-ahmed/4/head -> origin/gh/syed-ahmed/4/head 2025-09-07T06:13:37.1891710Z * [new branch] gh/syed-ahmed/4/orig -> origin/gh/syed-ahmed/4/orig 2025-09-07T06:13:37.1894459Z * [new branch] gh/syed-ahmed/5/base -> origin/gh/syed-ahmed/5/base 2025-09-07T06:13:37.1895581Z * [new branch] gh/syed-ahmed/5/head -> origin/gh/syed-ahmed/5/head 2025-09-07T06:13:37.1896713Z * [new branch] gh/syed-ahmed/5/orig -> origin/gh/syed-ahmed/5/orig 2025-09-07T06:13:37.1898845Z * [new branch] gh/teja-rao/4/base -> origin/gh/teja-rao/4/base 2025-09-07T06:13:37.1900141Z * [new branch] gh/teja-rao/4/head -> origin/gh/teja-rao/4/head 2025-09-07T06:13:37.1901805Z * [new branch] gh/teja-rao/4/orig -> origin/gh/teja-rao/4/orig 2025-09-07T06:13:37.1903817Z * [new branch] gh/tianyu-l/2/base -> origin/gh/tianyu-l/2/base 2025-09-07T06:13:37.1905032Z * [new branch] gh/tianyu-l/2/head -> origin/gh/tianyu-l/2/head 2025-09-07T06:13:37.1906173Z * [new branch] gh/tianyu-l/2/orig -> origin/gh/tianyu-l/2/orig 2025-09-07T06:13:37.1907662Z * [new branch] gh/tianyu-l/3/base -> origin/gh/tianyu-l/3/base 2025-09-07T06:13:37.1908778Z * [new branch] gh/tianyu-l/3/head -> origin/gh/tianyu-l/3/head 2025-09-07T06:13:37.1909915Z * [new branch] gh/tianyu-l/3/orig -> origin/gh/tianyu-l/3/orig 2025-09-07T06:13:37.1911496Z * [new branch] gh/tianyu-l/4/base -> origin/gh/tianyu-l/4/base 2025-09-07T06:13:37.1912590Z * [new branch] gh/tianyu-l/4/head -> origin/gh/tianyu-l/4/head 2025-09-07T06:13:37.1913797Z * [new branch] gh/tianyu-l/4/orig -> origin/gh/tianyu-l/4/orig 2025-09-07T06:13:37.1915830Z * [new branch] gh/tugsbayasgalan/1/base -> origin/gh/tugsbayasgalan/1/base 2025-09-07T06:13:37.1916902Z * [new branch] gh/tugsbayasgalan/1/head -> origin/gh/tugsbayasgalan/1/head 2025-09-07T06:13:37.1918140Z * [new branch] gh/tugsbayasgalan/1/orig -> origin/gh/tugsbayasgalan/1/orig 2025-09-07T06:13:37.1920028Z * [new branch] gh/tugsbayasgalan/10/base -> origin/gh/tugsbayasgalan/10/base 2025-09-07T06:13:37.1921162Z * [new branch] gh/tugsbayasgalan/10/head -> origin/gh/tugsbayasgalan/10/head 2025-09-07T06:13:37.1922300Z * [new branch] gh/tugsbayasgalan/10/orig -> origin/gh/tugsbayasgalan/10/orig 2025-09-07T06:13:37.1923670Z * [new branch] gh/tugsbayasgalan/11/base -> origin/gh/tugsbayasgalan/11/base 2025-09-07T06:13:37.1924868Z * [new branch] gh/tugsbayasgalan/11/head -> origin/gh/tugsbayasgalan/11/head 2025-09-07T06:13:37.1926020Z * [new branch] gh/tugsbayasgalan/11/orig -> origin/gh/tugsbayasgalan/11/orig 2025-09-07T06:13:37.1927636Z * [new branch] gh/tugsbayasgalan/12/base -> origin/gh/tugsbayasgalan/12/base 2025-09-07T06:13:37.1928858Z * [new branch] gh/tugsbayasgalan/12/head -> origin/gh/tugsbayasgalan/12/head 2025-09-07T06:13:37.1929963Z * [new branch] gh/tugsbayasgalan/12/orig -> origin/gh/tugsbayasgalan/12/orig 2025-09-07T06:13:37.1931454Z * [new branch] gh/tugsbayasgalan/13/base -> origin/gh/tugsbayasgalan/13/base 2025-09-07T06:13:37.1932559Z * [new branch] gh/tugsbayasgalan/13/head -> origin/gh/tugsbayasgalan/13/head 2025-09-07T06:13:37.1934051Z * [new branch] gh/tugsbayasgalan/13/orig -> origin/gh/tugsbayasgalan/13/orig 2025-09-07T06:13:37.1935751Z * [new branch] gh/tugsbayasgalan/14/base -> origin/gh/tugsbayasgalan/14/base 2025-09-07T06:13:37.1936861Z * [new branch] gh/tugsbayasgalan/14/head -> origin/gh/tugsbayasgalan/14/head 2025-09-07T06:13:37.1938041Z * [new branch] gh/tugsbayasgalan/14/orig -> origin/gh/tugsbayasgalan/14/orig 2025-09-07T06:13:37.1939779Z * [new branch] gh/tugsbayasgalan/15/base -> origin/gh/tugsbayasgalan/15/base 2025-09-07T06:13:37.1940948Z * [new branch] gh/tugsbayasgalan/15/head -> origin/gh/tugsbayasgalan/15/head 2025-09-07T06:13:37.1942055Z * [new branch] gh/tugsbayasgalan/15/orig -> origin/gh/tugsbayasgalan/15/orig 2025-09-07T06:13:37.1943677Z * [new branch] gh/tugsbayasgalan/2/base -> origin/gh/tugsbayasgalan/2/base 2025-09-07T06:13:37.1945051Z * [new branch] gh/tugsbayasgalan/2/head -> origin/gh/tugsbayasgalan/2/head 2025-09-07T06:13:37.1946172Z * [new branch] gh/tugsbayasgalan/2/orig -> origin/gh/tugsbayasgalan/2/orig 2025-09-07T06:13:37.1947530Z * [new branch] gh/tugsbayasgalan/3/base -> origin/gh/tugsbayasgalan/3/base 2025-09-07T06:13:37.1948953Z * [new branch] gh/tugsbayasgalan/3/head -> origin/gh/tugsbayasgalan/3/head 2025-09-07T06:13:37.1950217Z * [new branch] gh/tugsbayasgalan/3/orig -> origin/gh/tugsbayasgalan/3/orig 2025-09-07T06:13:37.1951604Z * [new branch] gh/tugsbayasgalan/4/base -> origin/gh/tugsbayasgalan/4/base 2025-09-07T06:13:37.1952929Z * [new branch] gh/tugsbayasgalan/4/head -> origin/gh/tugsbayasgalan/4/head 2025-09-07T06:13:37.1954074Z * [new branch] gh/tugsbayasgalan/4/orig -> origin/gh/tugsbayasgalan/4/orig 2025-09-07T06:13:37.1955735Z * [new branch] gh/tugsbayasgalan/5/base -> origin/gh/tugsbayasgalan/5/base 2025-09-07T06:13:37.1956952Z * [new branch] gh/tugsbayasgalan/5/head -> origin/gh/tugsbayasgalan/5/head 2025-09-07T06:13:37.1958091Z * [new branch] gh/tugsbayasgalan/5/orig -> origin/gh/tugsbayasgalan/5/orig 2025-09-07T06:13:37.1959561Z * [new branch] gh/tugsbayasgalan/6/base -> origin/gh/tugsbayasgalan/6/base 2025-09-07T06:13:37.1960759Z * [new branch] gh/tugsbayasgalan/6/head -> origin/gh/tugsbayasgalan/6/head 2025-09-07T06:13:37.1961895Z * [new branch] gh/tugsbayasgalan/6/orig -> origin/gh/tugsbayasgalan/6/orig 2025-09-07T06:13:37.1963488Z * [new branch] gh/tugsbayasgalan/7/base -> origin/gh/tugsbayasgalan/7/base 2025-09-07T06:13:37.1964574Z * [new branch] gh/tugsbayasgalan/7/head -> origin/gh/tugsbayasgalan/7/head 2025-09-07T06:13:37.1966256Z * [new branch] gh/tugsbayasgalan/7/orig -> origin/gh/tugsbayasgalan/7/orig 2025-09-07T06:13:37.1967798Z * [new branch] gh/tugsbayasgalan/8/base -> origin/gh/tugsbayasgalan/8/base 2025-09-07T06:13:37.1969308Z * [new branch] gh/tugsbayasgalan/8/head -> origin/gh/tugsbayasgalan/8/head 2025-09-07T06:13:37.1970465Z * [new branch] gh/tugsbayasgalan/8/orig -> origin/gh/tugsbayasgalan/8/orig 2025-09-07T06:13:37.1971990Z * [new branch] gh/tugsbayasgalan/9/base -> origin/gh/tugsbayasgalan/9/base 2025-09-07T06:13:37.1973222Z * [new branch] gh/tugsbayasgalan/9/head -> origin/gh/tugsbayasgalan/9/head 2025-09-07T06:13:37.1974469Z * [new branch] gh/tugsbayasgalan/9/orig -> origin/gh/tugsbayasgalan/9/orig 2025-09-07T06:13:37.1976452Z * [new branch] gh/v0i0/1/base -> origin/gh/v0i0/1/base 2025-09-07T06:13:37.1977646Z * [new branch] gh/v0i0/1/head -> origin/gh/v0i0/1/head 2025-09-07T06:13:37.1978871Z * [new branch] gh/v0i0/1/orig -> origin/gh/v0i0/1/orig 2025-09-07T06:13:37.1980470Z * [new branch] gh/v0i0/4/base -> origin/gh/v0i0/4/base 2025-09-07T06:13:37.1981540Z * [new branch] gh/v0i0/4/head -> origin/gh/v0i0/4/head 2025-09-07T06:13:37.1982628Z * [new branch] gh/v0i0/4/orig -> origin/gh/v0i0/4/orig 2025-09-07T06:13:37.1984254Z * [new branch] gh/v0i0/6/base -> origin/gh/v0i0/6/base 2025-09-07T06:13:37.1985536Z * [new branch] gh/v0i0/6/head -> origin/gh/v0i0/6/head 2025-09-07T06:13:37.1986650Z * [new branch] gh/v0i0/6/orig -> origin/gh/v0i0/6/orig 2025-09-07T06:13:37.1988216Z * [new branch] gh/v0i0/7/base -> origin/gh/v0i0/7/base 2025-09-07T06:13:37.1989419Z * [new branch] gh/v0i0/7/head -> origin/gh/v0i0/7/head 2025-09-07T06:13:37.1990493Z * [new branch] gh/v0i0/7/orig -> origin/gh/v0i0/7/orig 2025-09-07T06:13:37.1992140Z * [new branch] gh/v0i0/8/base -> origin/gh/v0i0/8/base 2025-09-07T06:13:37.1993497Z * [new branch] gh/v0i0/8/head -> origin/gh/v0i0/8/head 2025-09-07T06:13:37.1994667Z * [new branch] gh/v0i0/8/orig -> origin/gh/v0i0/8/orig 2025-09-07T06:13:37.1996252Z * [new branch] gh/v0i0/9/base -> origin/gh/v0i0/9/base 2025-09-07T06:13:37.1997523Z * [new branch] gh/v0i0/9/head -> origin/gh/v0i0/9/head 2025-09-07T06:13:37.1998583Z * [new branch] gh/v0i0/9/orig -> origin/gh/v0i0/9/orig 2025-09-07T06:13:37.2000448Z * [new branch] gh/vkuzo/1/next -> origin/gh/vkuzo/1/next 2025-09-07T06:13:37.2001958Z * [new branch] gh/vkuzo/2/next -> origin/gh/vkuzo/2/next 2025-09-07T06:13:37.2003539Z * [new branch] gh/vkuzo/3/next -> origin/gh/vkuzo/3/next 2025-09-07T06:13:37.2005251Z * [new branch] gh/vkuzo/4/base -> origin/gh/vkuzo/4/base 2025-09-07T06:13:37.2006574Z * [new branch] gh/vkuzo/4/head -> origin/gh/vkuzo/4/head 2025-09-07T06:13:37.2007791Z * [new branch] gh/vkuzo/4/orig -> origin/gh/vkuzo/4/orig 2025-09-07T06:13:37.2009500Z * [new branch] gh/vkuzo/5/base -> origin/gh/vkuzo/5/base 2025-09-07T06:13:37.2010782Z * [new branch] gh/vkuzo/5/head -> origin/gh/vkuzo/5/head 2025-09-07T06:13:37.2011941Z * [new branch] gh/vkuzo/5/orig -> origin/gh/vkuzo/5/orig 2025-09-07T06:13:37.2013985Z * [new branch] gh/vkuzo/6/base -> origin/gh/vkuzo/6/base 2025-09-07T06:13:37.2015061Z * [new branch] gh/vkuzo/6/head -> origin/gh/vkuzo/6/head 2025-09-07T06:13:37.2016312Z * [new branch] gh/vkuzo/6/orig -> origin/gh/vkuzo/6/orig 2025-09-07T06:13:37.2017783Z * [new branch] gh/vkuzo/7/base -> origin/gh/vkuzo/7/base 2025-09-07T06:13:37.2019128Z * [new branch] gh/vkuzo/7/head -> origin/gh/vkuzo/7/head 2025-09-07T06:13:37.2020317Z * [new branch] gh/vkuzo/7/orig -> origin/gh/vkuzo/7/orig 2025-09-07T06:13:37.2022423Z * [new branch] gh/wconstab/419/base -> origin/gh/wconstab/419/base 2025-09-07T06:13:37.2023518Z * [new branch] gh/wconstab/419/head -> origin/gh/wconstab/419/head 2025-09-07T06:13:37.2024705Z * [new branch] gh/wconstab/419/orig -> origin/gh/wconstab/419/orig 2025-09-07T06:13:37.2026548Z * [new branch] gh/wconstab/424/base -> origin/gh/wconstab/424/base 2025-09-07T06:13:37.2027634Z * [new branch] gh/wconstab/424/head -> origin/gh/wconstab/424/head 2025-09-07T06:13:37.2028726Z * [new branch] gh/wconstab/424/orig -> origin/gh/wconstab/424/orig 2025-09-07T06:13:37.2030272Z * [new branch] gh/wconstab/435/base -> origin/gh/wconstab/435/base 2025-09-07T06:13:37.2031434Z * [new branch] gh/wconstab/435/head -> origin/gh/wconstab/435/head 2025-09-07T06:13:37.2032556Z * [new branch] gh/wconstab/435/orig -> origin/gh/wconstab/435/orig 2025-09-07T06:13:37.2034110Z * [new branch] gh/wconstab/438/base -> origin/gh/wconstab/438/base 2025-09-07T06:13:37.2035704Z * [new branch] gh/wconstab/438/head -> origin/gh/wconstab/438/head 2025-09-07T06:13:37.2036856Z * [new branch] gh/wconstab/438/orig -> origin/gh/wconstab/438/orig 2025-09-07T06:13:37.2038467Z * [new branch] gh/wconstab/440/base -> origin/gh/wconstab/440/base 2025-09-07T06:13:37.2039788Z * [new branch] gh/wconstab/440/head -> origin/gh/wconstab/440/head 2025-09-07T06:13:37.2040989Z * [new branch] gh/wconstab/440/orig -> origin/gh/wconstab/440/orig 2025-09-07T06:13:37.2042710Z * [new branch] gh/wconstab/441/base -> origin/gh/wconstab/441/base 2025-09-07T06:13:37.2043787Z * [new branch] gh/wconstab/441/head -> origin/gh/wconstab/441/head 2025-09-07T06:13:37.2045002Z * [new branch] gh/wconstab/441/orig -> origin/gh/wconstab/441/orig 2025-09-07T06:13:37.2046751Z * [new branch] gh/wconstab/442/base -> origin/gh/wconstab/442/base 2025-09-07T06:13:37.2047905Z * [new branch] gh/wconstab/442/head -> origin/gh/wconstab/442/head 2025-09-07T06:13:37.2049085Z * [new branch] gh/wconstab/442/orig -> origin/gh/wconstab/442/orig 2025-09-07T06:13:37.2050641Z * [new branch] gh/wconstab/443/base -> origin/gh/wconstab/443/base 2025-09-07T06:13:37.2051758Z * [new branch] gh/wconstab/443/head -> origin/gh/wconstab/443/head 2025-09-07T06:13:37.2053723Z * [new branch] gh/wconstab/443/orig -> origin/gh/wconstab/443/orig 2025-09-07T06:13:37.2055490Z * [new branch] gh/wconstab/444/base -> origin/gh/wconstab/444/base 2025-09-07T06:13:37.2056721Z * [new branch] gh/wconstab/444/head -> origin/gh/wconstab/444/head 2025-09-07T06:13:37.2057879Z * [new branch] gh/wconstab/444/orig -> origin/gh/wconstab/444/orig 2025-09-07T06:13:37.2059482Z * [new branch] gh/wconstab/445/base -> origin/gh/wconstab/445/base 2025-09-07T06:13:37.2060621Z * [new branch] gh/wconstab/445/head -> origin/gh/wconstab/445/head 2025-09-07T06:13:37.2061780Z * [new branch] gh/wconstab/445/orig -> origin/gh/wconstab/445/orig 2025-09-07T06:13:37.2063991Z * [new branch] gh/wconstab/446/base -> origin/gh/wconstab/446/base 2025-09-07T06:13:37.2065404Z * [new branch] gh/wconstab/446/head -> origin/gh/wconstab/446/head 2025-09-07T06:13:37.2066949Z * [new branch] gh/wconstab/446/orig -> origin/gh/wconstab/446/orig 2025-09-07T06:13:37.2068524Z * [new branch] gh/wconstab/447/base -> origin/gh/wconstab/447/base 2025-09-07T06:13:37.2069601Z * [new branch] gh/wconstab/447/head -> origin/gh/wconstab/447/head 2025-09-07T06:13:37.2070755Z * [new branch] gh/wconstab/447/orig -> origin/gh/wconstab/447/orig 2025-09-07T06:13:37.2072752Z * [new branch] gh/weifengpy/27/base -> origin/gh/weifengpy/27/base 2025-09-07T06:13:37.2073854Z * [new branch] gh/weifengpy/27/head -> origin/gh/weifengpy/27/head 2025-09-07T06:13:37.2074953Z * [new branch] gh/weifengpy/27/orig -> origin/gh/weifengpy/27/orig 2025-09-07T06:13:37.2076477Z * [new branch] gh/weifengpy/30/base -> origin/gh/weifengpy/30/base 2025-09-07T06:13:37.2077573Z * [new branch] gh/weifengpy/30/head -> origin/gh/weifengpy/30/head 2025-09-07T06:13:37.2078700Z * [new branch] gh/weifengpy/30/orig -> origin/gh/weifengpy/30/orig 2025-09-07T06:13:37.2080684Z * [new branch] gh/williamwen42/196/base -> origin/gh/williamwen42/196/base 2025-09-07T06:13:37.2081855Z * [new branch] gh/williamwen42/196/head -> origin/gh/williamwen42/196/head 2025-09-07T06:13:37.2083145Z * [new branch] gh/williamwen42/196/orig -> origin/gh/williamwen42/196/orig 2025-09-07T06:13:37.2084774Z * [new branch] gh/williamwen42/250/base -> origin/gh/williamwen42/250/base 2025-09-07T06:13:37.2085902Z * [new branch] gh/williamwen42/250/head -> origin/gh/williamwen42/250/head 2025-09-07T06:13:37.2087183Z * [new branch] gh/williamwen42/250/orig -> origin/gh/williamwen42/250/orig 2025-09-07T06:13:37.2088966Z * [new branch] gh/williamwen42/258/base -> origin/gh/williamwen42/258/base 2025-09-07T06:13:37.2090227Z * [new branch] gh/williamwen42/258/head -> origin/gh/williamwen42/258/head 2025-09-07T06:13:37.2091364Z * [new branch] gh/williamwen42/258/orig -> origin/gh/williamwen42/258/orig 2025-09-07T06:13:37.2094529Z * [new branch] gh/williamwen42/266/base -> origin/gh/williamwen42/266/base 2025-09-07T06:13:37.2095743Z * [new branch] gh/williamwen42/266/head -> origin/gh/williamwen42/266/head 2025-09-07T06:13:37.2097034Z * [new branch] gh/williamwen42/266/orig -> origin/gh/williamwen42/266/orig 2025-09-07T06:13:37.2098617Z * [new branch] gh/williamwen42/267/base -> origin/gh/williamwen42/267/base 2025-09-07T06:13:37.2099899Z * [new branch] gh/williamwen42/267/head -> origin/gh/williamwen42/267/head 2025-09-07T06:13:37.2101093Z * [new branch] gh/williamwen42/267/orig -> origin/gh/williamwen42/267/orig 2025-09-07T06:13:37.2102761Z * [new branch] gh/williamwen42/270/base -> origin/gh/williamwen42/270/base 2025-09-07T06:13:37.2103968Z * [new branch] gh/williamwen42/270/head -> origin/gh/williamwen42/270/head 2025-09-07T06:13:37.2105359Z * [new branch] gh/williamwen42/270/orig -> origin/gh/williamwen42/270/orig 2025-09-07T06:13:37.2106927Z * [new branch] gh/williamwen42/271/base -> origin/gh/williamwen42/271/base 2025-09-07T06:13:37.2108153Z * [new branch] gh/williamwen42/271/head -> origin/gh/williamwen42/271/head 2025-09-07T06:13:37.2109260Z * [new branch] gh/williamwen42/271/orig -> origin/gh/williamwen42/271/orig 2025-09-07T06:13:37.2110787Z * [new branch] gh/williamwen42/272/base -> origin/gh/williamwen42/272/base 2025-09-07T06:13:37.2111931Z * [new branch] gh/williamwen42/272/head -> origin/gh/williamwen42/272/head 2025-09-07T06:13:37.2113196Z * [new branch] gh/williamwen42/272/orig -> origin/gh/williamwen42/272/orig 2025-09-07T06:13:37.2115769Z * [new branch] gh/williamwen42/274/base -> origin/gh/williamwen42/274/base 2025-09-07T06:13:37.2117066Z * [new branch] gh/williamwen42/274/head -> origin/gh/williamwen42/274/head 2025-09-07T06:13:37.2117330Z * [new branch] gh/williamwen42/274/orig -> origin/gh/williamwen42/274/orig 2025-09-07T06:13:37.2118701Z * [new branch] gh/williamwen42/275/base -> origin/gh/williamwen42/275/base 2025-09-07T06:13:37.2119806Z * [new branch] gh/williamwen42/275/head -> origin/gh/williamwen42/275/head 2025-09-07T06:13:37.2121342Z * [new branch] gh/williamwen42/276/base -> origin/gh/williamwen42/276/base 2025-09-07T06:13:37.2122461Z * [new branch] gh/williamwen42/276/head -> origin/gh/williamwen42/276/head 2025-09-07T06:13:37.2123641Z * [new branch] gh/williamwen42/276/orig -> origin/gh/williamwen42/276/orig 2025-09-07T06:13:37.2125303Z * [new branch] gh/williamwen42/277/base -> origin/gh/williamwen42/277/base 2025-09-07T06:13:37.2126437Z * [new branch] gh/williamwen42/277/head -> origin/gh/williamwen42/277/head 2025-09-07T06:13:37.2127522Z * [new branch] gh/williamwen42/277/orig -> origin/gh/williamwen42/277/orig 2025-09-07T06:13:37.2129193Z * [new branch] gh/williamwen42/278/base -> origin/gh/williamwen42/278/base 2025-09-07T06:13:37.2130306Z * [new branch] gh/williamwen42/278/head -> origin/gh/williamwen42/278/head 2025-09-07T06:13:37.2131415Z * [new branch] gh/williamwen42/278/orig -> origin/gh/williamwen42/278/orig 2025-09-07T06:13:37.2133209Z * [new branch] gh/williamwen42/279/base -> origin/gh/williamwen42/279/base 2025-09-07T06:13:37.2134464Z * [new branch] gh/williamwen42/279/head -> origin/gh/williamwen42/279/head 2025-09-07T06:13:37.2135580Z * [new branch] gh/williamwen42/279/orig -> origin/gh/williamwen42/279/orig 2025-09-07T06:13:37.2137291Z * [new branch] gh/williamwen42/280/base -> origin/gh/williamwen42/280/base 2025-09-07T06:13:37.2138458Z * [new branch] gh/williamwen42/280/head -> origin/gh/williamwen42/280/head 2025-09-07T06:13:37.2139658Z * [new branch] gh/williamwen42/280/orig -> origin/gh/williamwen42/280/orig 2025-09-07T06:13:37.2141282Z * [new branch] gh/williamwen42/281/base -> origin/gh/williamwen42/281/base 2025-09-07T06:13:37.2142429Z * [new branch] gh/williamwen42/281/head -> origin/gh/williamwen42/281/head 2025-09-07T06:13:37.2143970Z * [new branch] gh/williamwen42/281/orig -> origin/gh/williamwen42/281/orig 2025-09-07T06:13:37.2145883Z * [new branch] gh/williamwen42/282/base -> origin/gh/williamwen42/282/base 2025-09-07T06:13:37.2147010Z * [new branch] gh/williamwen42/282/head -> origin/gh/williamwen42/282/head 2025-09-07T06:13:37.2148147Z * [new branch] gh/williamwen42/282/orig -> origin/gh/williamwen42/282/orig 2025-09-07T06:13:37.2149931Z * [new branch] gh/williamwen42/283/base -> origin/gh/williamwen42/283/base 2025-09-07T06:13:37.2151111Z * [new branch] gh/williamwen42/283/head -> origin/gh/williamwen42/283/head 2025-09-07T06:13:37.2152273Z * [new branch] gh/williamwen42/283/orig -> origin/gh/williamwen42/283/orig 2025-09-07T06:13:37.2154236Z * [new branch] gh/williamwen42/284/base -> origin/gh/williamwen42/284/base 2025-09-07T06:13:37.2155302Z * [new branch] gh/williamwen42/284/head -> origin/gh/williamwen42/284/head 2025-09-07T06:13:37.2156449Z * [new branch] gh/williamwen42/284/orig -> origin/gh/williamwen42/284/orig 2025-09-07T06:13:37.2157904Z * [new branch] gh/williamwen42/285/base -> origin/gh/williamwen42/285/base 2025-09-07T06:13:37.2159239Z * [new branch] gh/williamwen42/285/head -> origin/gh/williamwen42/285/head 2025-09-07T06:13:37.2160386Z * [new branch] gh/williamwen42/285/orig -> origin/gh/williamwen42/285/orig 2025-09-07T06:13:37.2161802Z * [new branch] gh/williamwen42/286/base -> origin/gh/williamwen42/286/base 2025-09-07T06:13:37.2162891Z * [new branch] gh/williamwen42/286/head -> origin/gh/williamwen42/286/head 2025-09-07T06:13:37.2164004Z * [new branch] gh/williamwen42/286/orig -> origin/gh/williamwen42/286/orig 2025-09-07T06:13:37.2165683Z * [new branch] gh/williamwen42/287/base -> origin/gh/williamwen42/287/base 2025-09-07T06:13:37.2166879Z * [new branch] gh/williamwen42/287/head -> origin/gh/williamwen42/287/head 2025-09-07T06:13:37.2168049Z * [new branch] gh/williamwen42/287/orig -> origin/gh/williamwen42/287/orig 2025-09-07T06:13:37.2169788Z * [new branch] gh/williamwen42/288/base -> origin/gh/williamwen42/288/base 2025-09-07T06:13:37.2170941Z * [new branch] gh/williamwen42/288/head -> origin/gh/williamwen42/288/head 2025-09-07T06:13:37.2172074Z * [new branch] gh/williamwen42/288/orig -> origin/gh/williamwen42/288/orig 2025-09-07T06:13:37.2173979Z * [new branch] gh/williamwen42/289/base -> origin/gh/williamwen42/289/base 2025-09-07T06:13:37.2175116Z * [new branch] gh/williamwen42/289/head -> origin/gh/williamwen42/289/head 2025-09-07T06:13:37.2176299Z * [new branch] gh/williamwen42/289/orig -> origin/gh/williamwen42/289/orig 2025-09-07T06:13:37.2178470Z * [new branch] gh/wychi/1/base -> origin/gh/wychi/1/base 2025-09-07T06:13:37.2179849Z * [new branch] gh/wychi/1/head -> origin/gh/wychi/1/head 2025-09-07T06:13:37.2181099Z * [new branch] gh/wychi/1/orig -> origin/gh/wychi/1/orig 2025-09-07T06:13:37.2183470Z * [new branch] gh/xmfan/169/base -> origin/gh/xmfan/169/base 2025-09-07T06:13:37.2184744Z * [new branch] gh/xmfan/169/head -> origin/gh/xmfan/169/head 2025-09-07T06:13:37.2186265Z * [new branch] gh/xmfan/170/base -> origin/gh/xmfan/170/base 2025-09-07T06:13:37.2187327Z * [new branch] gh/xmfan/170/head -> origin/gh/xmfan/170/head 2025-09-07T06:13:37.2188924Z * [new branch] gh/xmfan/18/base -> origin/gh/xmfan/18/base 2025-09-07T06:13:37.2190086Z * [new branch] gh/xmfan/18/head -> origin/gh/xmfan/18/head 2025-09-07T06:13:37.2191535Z * [new branch] gh/xmfan/229/base -> origin/gh/xmfan/229/base 2025-09-07T06:13:37.2193120Z * [new branch] gh/xmfan/229/head -> origin/gh/xmfan/229/head 2025-09-07T06:13:37.2194231Z * [new branch] gh/xmfan/229/orig -> origin/gh/xmfan/229/orig 2025-09-07T06:13:37.2195758Z * [new branch] gh/xmfan/237/base -> origin/gh/xmfan/237/base 2025-09-07T06:13:37.2196940Z * [new branch] gh/xmfan/237/head -> origin/gh/xmfan/237/head 2025-09-07T06:13:37.2198083Z * [new branch] gh/xmfan/237/orig -> origin/gh/xmfan/237/orig 2025-09-07T06:13:37.2199727Z * [new branch] gh/xmfan/244/base -> origin/gh/xmfan/244/base 2025-09-07T06:13:37.2200840Z * [new branch] gh/xmfan/244/head -> origin/gh/xmfan/244/head 2025-09-07T06:13:37.2201988Z * [new branch] gh/xmfan/244/orig -> origin/gh/xmfan/244/orig 2025-09-07T06:13:37.2203538Z * [new branch] gh/xmfan/246/base -> origin/gh/xmfan/246/base 2025-09-07T06:13:37.2204785Z * [new branch] gh/xmfan/246/head -> origin/gh/xmfan/246/head 2025-09-07T06:13:37.2205933Z * [new branch] gh/xmfan/246/orig -> origin/gh/xmfan/246/orig 2025-09-07T06:13:37.2207469Z * [new branch] gh/xmfan/253/base -> origin/gh/xmfan/253/base 2025-09-07T06:13:37.2208570Z * [new branch] gh/xmfan/253/head -> origin/gh/xmfan/253/head 2025-09-07T06:13:37.2209677Z * [new branch] gh/xmfan/253/orig -> origin/gh/xmfan/253/orig 2025-09-07T06:13:37.2211144Z * [new branch] gh/xmfan/254/base -> origin/gh/xmfan/254/base 2025-09-07T06:13:37.2212246Z * [new branch] gh/xmfan/254/head -> origin/gh/xmfan/254/head 2025-09-07T06:13:37.2213656Z * [new branch] gh/xmfan/254/orig -> origin/gh/xmfan/254/orig 2025-09-07T06:13:37.2215270Z * [new branch] gh/xmfan/260/base -> origin/gh/xmfan/260/base 2025-09-07T06:13:37.2216395Z * [new branch] gh/xmfan/260/head -> origin/gh/xmfan/260/head 2025-09-07T06:13:37.2217560Z * [new branch] gh/xmfan/260/orig -> origin/gh/xmfan/260/orig 2025-09-07T06:13:37.2219083Z * [new branch] gh/xmfan/262/base -> origin/gh/xmfan/262/base 2025-09-07T06:13:37.2220202Z * [new branch] gh/xmfan/262/head -> origin/gh/xmfan/262/head 2025-09-07T06:13:37.2221347Z * [new branch] gh/xmfan/262/orig -> origin/gh/xmfan/262/orig 2025-09-07T06:13:37.2223005Z * [new branch] gh/xmfan/263/base -> origin/gh/xmfan/263/base 2025-09-07T06:13:37.2224159Z * [new branch] gh/xmfan/263/head -> origin/gh/xmfan/263/head 2025-09-07T06:13:37.2225437Z * [new branch] gh/xmfan/263/orig -> origin/gh/xmfan/263/orig 2025-09-07T06:13:37.2226886Z * [new branch] gh/xmfan/264/base -> origin/gh/xmfan/264/base 2025-09-07T06:13:37.2228125Z * [new branch] gh/xmfan/264/head -> origin/gh/xmfan/264/head 2025-09-07T06:13:37.2229243Z * [new branch] gh/xmfan/264/orig -> origin/gh/xmfan/264/orig 2025-09-07T06:13:37.2230804Z * [new branch] gh/xmfan/274/base -> origin/gh/xmfan/274/base 2025-09-07T06:13:37.2231922Z * [new branch] gh/xmfan/274/head -> origin/gh/xmfan/274/head 2025-09-07T06:13:37.2233038Z * [new branch] gh/xmfan/274/orig -> origin/gh/xmfan/274/orig 2025-09-07T06:13:37.2234566Z * [new branch] gh/xmfan/276/base -> origin/gh/xmfan/276/base 2025-09-07T06:13:37.2235648Z * [new branch] gh/xmfan/276/head -> origin/gh/xmfan/276/head 2025-09-07T06:13:37.2236895Z * [new branch] gh/xmfan/276/orig -> origin/gh/xmfan/276/orig 2025-09-07T06:13:37.2238318Z * [new branch] gh/xmfan/277/base -> origin/gh/xmfan/277/base 2025-09-07T06:13:37.2239888Z * [new branch] gh/xmfan/277/head -> origin/gh/xmfan/277/head 2025-09-07T06:13:37.2241004Z * [new branch] gh/xmfan/277/orig -> origin/gh/xmfan/277/orig 2025-09-07T06:13:37.2242456Z * [new branch] gh/xmfan/278/base -> origin/gh/xmfan/278/base 2025-09-07T06:13:37.2243625Z * [new branch] gh/xmfan/278/head -> origin/gh/xmfan/278/head 2025-09-07T06:13:37.2244718Z * [new branch] gh/xmfan/278/orig -> origin/gh/xmfan/278/orig 2025-09-07T06:13:37.2246719Z * [new branch] gh/xmfan/279/base -> origin/gh/xmfan/279/base 2025-09-07T06:13:37.2247863Z * [new branch] gh/xmfan/279/head -> origin/gh/xmfan/279/head 2025-09-07T06:13:37.2248962Z * [new branch] gh/xmfan/279/orig -> origin/gh/xmfan/279/orig 2025-09-07T06:13:37.2250490Z * [new branch] gh/xmfan/280/base -> origin/gh/xmfan/280/base 2025-09-07T06:13:37.2251606Z * [new branch] gh/xmfan/280/head -> origin/gh/xmfan/280/head 2025-09-07T06:13:37.2252785Z * [new branch] gh/xmfan/280/orig -> origin/gh/xmfan/280/orig 2025-09-07T06:13:37.2255114Z * [new branch] gh/xmfan/281/base -> origin/gh/xmfan/281/base 2025-09-07T06:13:37.2256267Z * [new branch] gh/xmfan/281/head -> origin/gh/xmfan/281/head 2025-09-07T06:13:37.2257413Z * [new branch] gh/xmfan/281/orig -> origin/gh/xmfan/281/orig 2025-09-07T06:13:37.2258997Z * [new branch] gh/xmfan/282/base -> origin/gh/xmfan/282/base 2025-09-07T06:13:37.2260174Z * [new branch] gh/xmfan/282/head -> origin/gh/xmfan/282/head 2025-09-07T06:13:37.2261754Z * [new branch] gh/xmfan/283/base -> origin/gh/xmfan/283/base 2025-09-07T06:13:37.2263010Z * [new branch] gh/xmfan/283/head -> origin/gh/xmfan/283/head 2025-09-07T06:13:37.2264155Z * [new branch] gh/xmfan/283/orig -> origin/gh/xmfan/283/orig 2025-09-07T06:13:37.2266117Z * [new branch] gh/xuanzhang816/14/base -> origin/gh/xuanzhang816/14/base 2025-09-07T06:13:37.2271156Z * [new branch] gh/xuanzhang816/14/head -> origin/gh/xuanzhang816/14/head 2025-09-07T06:13:37.2272253Z * [new branch] gh/xuanzhang816/14/orig -> origin/gh/xuanzhang816/14/orig 2025-09-07T06:13:37.2273794Z * [new branch] gh/xuanzhang816/19/base -> origin/gh/xuanzhang816/19/base 2025-09-07T06:13:37.2274909Z * [new branch] gh/xuanzhang816/19/head -> origin/gh/xuanzhang816/19/head 2025-09-07T06:13:37.2276662Z * [new branch] gh/xuanzhang816/19/orig -> origin/gh/xuanzhang816/19/orig 2025-09-07T06:13:37.2278209Z * [new branch] gh/xuanzhang816/22/base -> origin/gh/xuanzhang816/22/base 2025-09-07T06:13:37.2279328Z * [new branch] gh/xuanzhang816/22/head -> origin/gh/xuanzhang816/22/head 2025-09-07T06:13:37.2280451Z * [new branch] gh/xuanzhang816/22/orig -> origin/gh/xuanzhang816/22/orig 2025-09-07T06:13:37.2282112Z * [new branch] gh/xuanzhang816/23/base -> origin/gh/xuanzhang816/23/base 2025-09-07T06:13:37.2283221Z * [new branch] gh/xuanzhang816/23/head -> origin/gh/xuanzhang816/23/head 2025-09-07T06:13:37.2284319Z * [new branch] gh/xuanzhang816/23/orig -> origin/gh/xuanzhang816/23/orig 2025-09-07T06:13:37.2286288Z * [new branch] gh/xuanzhang816/24/base -> origin/gh/xuanzhang816/24/base 2025-09-07T06:13:37.2287446Z * [new branch] gh/xuanzhang816/24/head -> origin/gh/xuanzhang816/24/head 2025-09-07T06:13:37.2288611Z * [new branch] gh/xuanzhang816/24/orig -> origin/gh/xuanzhang816/24/orig 2025-09-07T06:13:37.2290037Z * [new branch] gh/xuanzhang816/25/base -> origin/gh/xuanzhang816/25/base 2025-09-07T06:13:37.2291127Z * [new branch] gh/xuanzhang816/25/head -> origin/gh/xuanzhang816/25/head 2025-09-07T06:13:37.2292378Z * [new branch] gh/xuanzhang816/25/orig -> origin/gh/xuanzhang816/25/orig 2025-09-07T06:13:37.2294369Z * [new branch] gh/xuanzhang816/26/base -> origin/gh/xuanzhang816/26/base 2025-09-07T06:13:37.2295487Z * [new branch] gh/xuanzhang816/26/head -> origin/gh/xuanzhang816/26/head 2025-09-07T06:13:37.2296632Z * [new branch] gh/xuanzhang816/26/orig -> origin/gh/xuanzhang816/26/orig 2025-09-07T06:13:37.2298749Z * [new branch] gh/yanbing-j/11/base -> origin/gh/yanbing-j/11/base 2025-09-07T06:13:37.2299892Z * [new branch] gh/yanbing-j/11/head -> origin/gh/yanbing-j/11/head 2025-09-07T06:13:37.2301159Z * [new branch] gh/yanbing-j/11/orig -> origin/gh/yanbing-j/11/orig 2025-09-07T06:13:37.2302716Z * [new branch] gh/yanbing-j/12/base -> origin/gh/yanbing-j/12/base 2025-09-07T06:13:37.2303870Z * [new branch] gh/yanbing-j/12/head -> origin/gh/yanbing-j/12/head 2025-09-07T06:13:37.2305118Z * [new branch] gh/yanbing-j/12/orig -> origin/gh/yanbing-j/12/orig 2025-09-07T06:13:37.2306668Z * [new branch] gh/yanbing-j/13/base -> origin/gh/yanbing-j/13/base 2025-09-07T06:13:37.2307756Z * [new branch] gh/yanbing-j/13/head -> origin/gh/yanbing-j/13/head 2025-09-07T06:13:37.2308868Z * [new branch] gh/yanbing-j/13/orig -> origin/gh/yanbing-j/13/orig 2025-09-07T06:13:37.2310385Z * [new branch] gh/yanbing-j/14/base -> origin/gh/yanbing-j/14/base 2025-09-07T06:13:37.2311510Z * [new branch] gh/yanbing-j/14/head -> origin/gh/yanbing-j/14/head 2025-09-07T06:13:37.2312649Z * [new branch] gh/yanbing-j/14/orig -> origin/gh/yanbing-j/14/orig 2025-09-07T06:13:37.2314130Z * [new branch] gh/yanbing-j/15/base -> origin/gh/yanbing-j/15/base 2025-09-07T06:13:37.2315244Z * [new branch] gh/yanbing-j/15/head -> origin/gh/yanbing-j/15/head 2025-09-07T06:13:37.2316356Z * [new branch] gh/yanbing-j/15/orig -> origin/gh/yanbing-j/15/orig 2025-09-07T06:13:37.2317762Z * [new branch] gh/yanbing-j/18/base -> origin/gh/yanbing-j/18/base 2025-09-07T06:13:37.2318894Z * [new branch] gh/yanbing-j/18/head -> origin/gh/yanbing-j/18/head 2025-09-07T06:13:37.2320012Z * [new branch] gh/yanbing-j/18/orig -> origin/gh/yanbing-j/18/orig 2025-09-07T06:13:37.2321506Z * [new branch] gh/yanbing-j/19/base -> origin/gh/yanbing-j/19/base 2025-09-07T06:13:37.2322641Z * [new branch] gh/yanbing-j/19/head -> origin/gh/yanbing-j/19/head 2025-09-07T06:13:37.2323692Z * [new branch] gh/yanbing-j/19/orig -> origin/gh/yanbing-j/19/orig 2025-09-07T06:13:37.2325271Z * [new branch] gh/yanbing-j/20/base -> origin/gh/yanbing-j/20/base 2025-09-07T06:13:37.2326322Z * [new branch] gh/yanbing-j/20/head -> origin/gh/yanbing-j/20/head 2025-09-07T06:13:37.2327505Z * [new branch] gh/yanbing-j/20/orig -> origin/gh/yanbing-j/20/orig 2025-09-07T06:13:37.2329176Z * [new branch] gh/yanbing-j/21/base -> origin/gh/yanbing-j/21/base 2025-09-07T06:13:37.2330249Z * [new branch] gh/yanbing-j/21/head -> origin/gh/yanbing-j/21/head 2025-09-07T06:13:37.2331793Z * [new branch] gh/yanbing-j/22/base -> origin/gh/yanbing-j/22/base 2025-09-07T06:13:37.2332917Z * [new branch] gh/yanbing-j/22/head -> origin/gh/yanbing-j/22/head 2025-09-07T06:13:37.2334367Z * [new branch] gh/yanbing-j/22/orig -> origin/gh/yanbing-j/22/orig 2025-09-07T06:13:37.2335894Z * [new branch] gh/yanbing-j/23/base -> origin/gh/yanbing-j/23/base 2025-09-07T06:13:37.2336962Z * [new branch] gh/yanbing-j/23/head -> origin/gh/yanbing-j/23/head 2025-09-07T06:13:37.2338154Z * [new branch] gh/yanbing-j/23/orig -> origin/gh/yanbing-j/23/orig 2025-09-07T06:13:37.2339718Z * [new branch] gh/yanbing-j/24/base -> origin/gh/yanbing-j/24/base 2025-09-07T06:13:37.2340831Z * [new branch] gh/yanbing-j/24/head -> origin/gh/yanbing-j/24/head 2025-09-07T06:13:37.2341962Z * [new branch] gh/yanbing-j/24/orig -> origin/gh/yanbing-j/24/orig 2025-09-07T06:13:37.2343672Z * [new branch] gh/yanbing-j/25/base -> origin/gh/yanbing-j/25/base 2025-09-07T06:13:37.2344823Z * [new branch] gh/yanbing-j/25/head -> origin/gh/yanbing-j/25/head 2025-09-07T06:13:37.2345994Z * [new branch] gh/yanbing-j/25/orig -> origin/gh/yanbing-j/25/orig 2025-09-07T06:13:37.2347534Z * [new branch] gh/yanbing-j/26/base -> origin/gh/yanbing-j/26/base 2025-09-07T06:13:37.2348647Z * [new branch] gh/yanbing-j/26/head -> origin/gh/yanbing-j/26/head 2025-09-07T06:13:37.2349725Z * [new branch] gh/yanbing-j/26/orig -> origin/gh/yanbing-j/26/orig 2025-09-07T06:13:37.2351262Z * [new branch] gh/yanbing-j/36/base -> origin/gh/yanbing-j/36/base 2025-09-07T06:13:37.2352314Z * [new branch] gh/yanbing-j/36/head -> origin/gh/yanbing-j/36/head 2025-09-07T06:13:37.2353443Z * [new branch] gh/yanbing-j/36/orig -> origin/gh/yanbing-j/36/orig 2025-09-07T06:13:37.2354967Z * [new branch] gh/yanbing-j/37/base -> origin/gh/yanbing-j/37/base 2025-09-07T06:13:37.2356050Z * [new branch] gh/yanbing-j/37/head -> origin/gh/yanbing-j/37/head 2025-09-07T06:13:37.2357165Z * [new branch] gh/yanbing-j/37/orig -> origin/gh/yanbing-j/37/orig 2025-09-07T06:13:37.2359124Z * [new branch] gh/yangw-dev/12/base -> origin/gh/yangw-dev/12/base 2025-09-07T06:13:37.2360223Z * [new branch] gh/yangw-dev/12/head -> origin/gh/yangw-dev/12/head 2025-09-07T06:13:37.2361323Z * [new branch] gh/yangw-dev/12/orig -> origin/gh/yangw-dev/12/orig 2025-09-07T06:13:37.2362902Z * [new branch] gh/yangw-dev/13/base -> origin/gh/yangw-dev/13/base 2025-09-07T06:13:37.2364017Z * [new branch] gh/yangw-dev/13/head -> origin/gh/yangw-dev/13/head 2025-09-07T06:13:37.2365069Z * [new branch] gh/yangw-dev/13/orig -> origin/gh/yangw-dev/13/orig 2025-09-07T06:13:37.2366807Z * [new branch] gh/yangw-dev/14/base -> origin/gh/yangw-dev/14/base 2025-09-07T06:13:37.2367841Z * [new branch] gh/yangw-dev/14/head -> origin/gh/yangw-dev/14/head 2025-09-07T06:13:37.2368997Z * [new branch] gh/yangw-dev/14/orig -> origin/gh/yangw-dev/14/orig 2025-09-07T06:13:37.2370638Z * [new branch] gh/yangw-dev/15/base -> origin/gh/yangw-dev/15/base 2025-09-07T06:13:37.2371714Z * [new branch] gh/yangw-dev/15/head -> origin/gh/yangw-dev/15/head 2025-09-07T06:13:37.2372894Z * [new branch] gh/yangw-dev/15/orig -> origin/gh/yangw-dev/15/orig 2025-09-07T06:13:37.2374871Z * [new branch] gh/yangw-dev/16/base -> origin/gh/yangw-dev/16/base 2025-09-07T06:13:37.2375945Z * [new branch] gh/yangw-dev/16/head -> origin/gh/yangw-dev/16/head 2025-09-07T06:13:37.2377622Z * [new branch] gh/yangw-dev/16/orig -> origin/gh/yangw-dev/16/orig 2025-09-07T06:13:37.2379180Z * [new branch] gh/yangw-dev/17/base -> origin/gh/yangw-dev/17/base 2025-09-07T06:13:37.2380365Z * [new branch] gh/yangw-dev/17/head -> origin/gh/yangw-dev/17/head 2025-09-07T06:13:37.2381427Z * [new branch] gh/yangw-dev/17/orig -> origin/gh/yangw-dev/17/orig 2025-09-07T06:13:37.2382987Z * [new branch] gh/yangw-dev/18/base -> origin/gh/yangw-dev/18/base 2025-09-07T06:13:37.2384057Z * [new branch] gh/yangw-dev/18/head -> origin/gh/yangw-dev/18/head 2025-09-07T06:13:37.2385335Z * [new branch] gh/yangw-dev/18/orig -> origin/gh/yangw-dev/18/orig 2025-09-07T06:13:37.2386880Z * [new branch] gh/yangw-dev/19/base -> origin/gh/yangw-dev/19/base 2025-09-07T06:13:37.2387973Z * [new branch] gh/yangw-dev/19/head -> origin/gh/yangw-dev/19/head 2025-09-07T06:13:37.2389082Z * [new branch] gh/yangw-dev/19/orig -> origin/gh/yangw-dev/19/orig 2025-09-07T06:13:37.2391174Z * [new branch] gh/yangw-dev/20/base -> origin/gh/yangw-dev/20/base 2025-09-07T06:13:37.2392361Z * [new branch] gh/yangw-dev/20/head -> origin/gh/yangw-dev/20/head 2025-09-07T06:13:37.2394457Z * [new branch] gh/yangw-dev/20/orig -> origin/gh/yangw-dev/20/orig 2025-09-07T06:13:37.2396049Z * [new branch] gh/yangw-dev/21/base -> origin/gh/yangw-dev/21/base 2025-09-07T06:13:37.2397183Z * [new branch] gh/yangw-dev/21/head -> origin/gh/yangw-dev/21/head 2025-09-07T06:13:37.2398345Z * [new branch] gh/yangw-dev/21/orig -> origin/gh/yangw-dev/21/orig 2025-09-07T06:13:37.2399922Z * [new branch] gh/yangw-dev/22/base -> origin/gh/yangw-dev/22/base 2025-09-07T06:13:37.2401034Z * [new branch] gh/yangw-dev/22/head -> origin/gh/yangw-dev/22/head 2025-09-07T06:13:37.2402150Z * [new branch] gh/yangw-dev/22/orig -> origin/gh/yangw-dev/22/orig 2025-09-07T06:13:37.2403692Z * [new branch] gh/yangw-dev/23/base -> origin/gh/yangw-dev/23/base 2025-09-07T06:13:37.2404927Z * [new branch] gh/yangw-dev/23/head -> origin/gh/yangw-dev/23/head 2025-09-07T06:13:37.2406031Z * [new branch] gh/yangw-dev/23/orig -> origin/gh/yangw-dev/23/orig 2025-09-07T06:13:37.2407668Z * [new branch] gh/yangw-dev/24/base -> origin/gh/yangw-dev/24/base 2025-09-07T06:13:37.2408704Z * [new branch] gh/yangw-dev/24/head -> origin/gh/yangw-dev/24/head 2025-09-07T06:13:37.2409858Z * [new branch] gh/yangw-dev/24/orig -> origin/gh/yangw-dev/24/orig 2025-09-07T06:13:37.2411431Z * [new branch] gh/yangw-dev/25/base -> origin/gh/yangw-dev/25/base 2025-09-07T06:13:37.2412446Z * [new branch] gh/yangw-dev/25/head -> origin/gh/yangw-dev/25/head 2025-09-07T06:13:37.2413891Z * [new branch] gh/yangw-dev/25/orig -> origin/gh/yangw-dev/25/orig 2025-09-07T06:13:37.2415497Z * [new branch] gh/yangw-dev/26/base -> origin/gh/yangw-dev/26/base 2025-09-07T06:13:37.2416594Z * [new branch] gh/yangw-dev/26/head -> origin/gh/yangw-dev/26/head 2025-09-07T06:13:37.2417732Z * [new branch] gh/yangw-dev/26/orig -> origin/gh/yangw-dev/26/orig 2025-09-07T06:13:37.2419338Z * [new branch] gh/yangw-dev/27/base -> origin/gh/yangw-dev/27/base 2025-09-07T06:13:37.2420430Z * [new branch] gh/yangw-dev/27/head -> origin/gh/yangw-dev/27/head 2025-09-07T06:13:37.2421584Z * [new branch] gh/yangw-dev/27/orig -> origin/gh/yangw-dev/27/orig 2025-09-07T06:13:37.2423601Z * [new branch] gh/ydwu4/233/base -> origin/gh/ydwu4/233/base 2025-09-07T06:13:37.2424771Z * [new branch] gh/ydwu4/233/head -> origin/gh/ydwu4/233/head 2025-09-07T06:13:37.2426005Z * [new branch] gh/ydwu4/233/orig -> origin/gh/ydwu4/233/orig 2025-09-07T06:13:37.2427935Z * [new branch] gh/ydwu4/246/base -> origin/gh/ydwu4/246/base 2025-09-07T06:13:37.2428932Z * [new branch] gh/ydwu4/246/head -> origin/gh/ydwu4/246/head 2025-09-07T06:13:37.2430004Z * [new branch] gh/ydwu4/246/orig -> origin/gh/ydwu4/246/orig 2025-09-07T06:13:37.2431696Z * [new branch] gh/ydwu4/253/base -> origin/gh/ydwu4/253/base 2025-09-07T06:13:37.2432857Z * [new branch] gh/ydwu4/253/head -> origin/gh/ydwu4/253/head 2025-09-07T06:13:37.2434152Z * [new branch] gh/ydwu4/253/orig -> origin/gh/ydwu4/253/orig 2025-09-07T06:13:37.2435719Z * [new branch] gh/ydwu4/255/base -> origin/gh/ydwu4/255/base 2025-09-07T06:13:37.2436785Z * [new branch] gh/ydwu4/255/head -> origin/gh/ydwu4/255/head 2025-09-07T06:13:37.2438065Z * [new branch] gh/ydwu4/255/orig -> origin/gh/ydwu4/255/orig 2025-09-07T06:13:37.2439755Z * [new branch] gh/ydwu4/259/base -> origin/gh/ydwu4/259/base 2025-09-07T06:13:37.2440745Z * [new branch] gh/ydwu4/259/head -> origin/gh/ydwu4/259/head 2025-09-07T06:13:37.2441823Z * [new branch] gh/ydwu4/259/orig -> origin/gh/ydwu4/259/orig 2025-09-07T06:13:37.2443483Z * [new branch] gh/ydwu4/262/base -> origin/gh/ydwu4/262/base 2025-09-07T06:13:37.2444566Z * [new branch] gh/ydwu4/262/head -> origin/gh/ydwu4/262/head 2025-09-07T06:13:37.2445708Z * [new branch] gh/ydwu4/262/orig -> origin/gh/ydwu4/262/orig 2025-09-07T06:13:37.2447269Z * [new branch] gh/ydwu4/263/base -> origin/gh/ydwu4/263/base 2025-09-07T06:13:37.2448430Z * [new branch] gh/ydwu4/263/head -> origin/gh/ydwu4/263/head 2025-09-07T06:13:37.2449524Z * [new branch] gh/ydwu4/263/orig -> origin/gh/ydwu4/263/orig 2025-09-07T06:13:37.2451239Z * [new branch] gh/ydwu4/269/base -> origin/gh/ydwu4/269/base 2025-09-07T06:13:37.2452215Z * [new branch] gh/ydwu4/269/head -> origin/gh/ydwu4/269/head 2025-09-07T06:13:37.2453815Z * [new branch] gh/ydwu4/269/orig -> origin/gh/ydwu4/269/orig 2025-09-07T06:13:37.2455441Z * [new branch] gh/ydwu4/270/base -> origin/gh/ydwu4/270/base 2025-09-07T06:13:37.2456591Z * [new branch] gh/ydwu4/270/head -> origin/gh/ydwu4/270/head 2025-09-07T06:13:37.2457767Z * [new branch] gh/ydwu4/270/orig -> origin/gh/ydwu4/270/orig 2025-09-07T06:13:37.2459411Z * [new branch] gh/ydwu4/272/base -> origin/gh/ydwu4/272/base 2025-09-07T06:13:37.2460700Z * [new branch] gh/ydwu4/272/head -> origin/gh/ydwu4/272/head 2025-09-07T06:13:37.2461806Z * [new branch] gh/ydwu4/272/orig -> origin/gh/ydwu4/272/orig 2025-09-07T06:13:37.2463285Z * [new branch] gh/ydwu4/275/base -> origin/gh/ydwu4/275/base 2025-09-07T06:13:37.2464382Z * [new branch] gh/ydwu4/275/head -> origin/gh/ydwu4/275/head 2025-09-07T06:13:37.2465602Z * [new branch] gh/ydwu4/275/orig -> origin/gh/ydwu4/275/orig 2025-09-07T06:13:37.2467050Z * [new branch] gh/ydwu4/276/base -> origin/gh/ydwu4/276/base 2025-09-07T06:13:37.2468081Z * [new branch] gh/ydwu4/276/head -> origin/gh/ydwu4/276/head 2025-09-07T06:13:37.2469352Z * [new branch] gh/ydwu4/276/orig -> origin/gh/ydwu4/276/orig 2025-09-07T06:13:37.2471013Z * [new branch] gh/ydwu4/279/base -> origin/gh/ydwu4/279/base 2025-09-07T06:13:37.2472284Z * [new branch] gh/ydwu4/279/head -> origin/gh/ydwu4/279/head 2025-09-07T06:13:37.2473352Z * [new branch] gh/ydwu4/279/orig -> origin/gh/ydwu4/279/orig 2025-09-07T06:13:37.2475318Z * [new branch] gh/ydwu4/283/base -> origin/gh/ydwu4/283/base 2025-09-07T06:13:37.2476278Z * [new branch] gh/ydwu4/283/head -> origin/gh/ydwu4/283/head 2025-09-07T06:13:37.2477430Z * [new branch] gh/ydwu4/283/orig -> origin/gh/ydwu4/283/orig 2025-09-07T06:13:37.2479015Z * [new branch] gh/ydwu4/289/base -> origin/gh/ydwu4/289/base 2025-09-07T06:13:37.2480066Z * [new branch] gh/ydwu4/289/head -> origin/gh/ydwu4/289/head 2025-09-07T06:13:37.2481182Z * [new branch] gh/ydwu4/289/orig -> origin/gh/ydwu4/289/orig 2025-09-07T06:13:37.2482850Z * [new branch] gh/ydwu4/290/base -> origin/gh/ydwu4/290/base 2025-09-07T06:13:37.2483991Z * [new branch] gh/ydwu4/290/head -> origin/gh/ydwu4/290/head 2025-09-07T06:13:37.2485307Z * [new branch] gh/ydwu4/290/orig -> origin/gh/ydwu4/290/orig 2025-09-07T06:13:37.2487342Z * [new branch] gh/ydwu4/291/base -> origin/gh/ydwu4/291/base 2025-09-07T06:13:37.2488491Z * [new branch] gh/ydwu4/291/head -> origin/gh/ydwu4/291/head 2025-09-07T06:13:37.2489644Z * [new branch] gh/ydwu4/291/orig -> origin/gh/ydwu4/291/orig 2025-09-07T06:13:37.2491391Z * [new branch] gh/ydwu4/292/base -> origin/gh/ydwu4/292/base 2025-09-07T06:13:37.2492904Z * [new branch] gh/ydwu4/292/head -> origin/gh/ydwu4/292/head 2025-09-07T06:13:37.2494067Z * [new branch] gh/ydwu4/292/orig -> origin/gh/ydwu4/292/orig 2025-09-07T06:13:37.2495670Z * [new branch] gh/ydwu4/293/base -> origin/gh/ydwu4/293/base 2025-09-07T06:13:37.2496744Z * [new branch] gh/ydwu4/293/head -> origin/gh/ydwu4/293/head 2025-09-07T06:13:37.2497878Z * [new branch] gh/ydwu4/293/orig -> origin/gh/ydwu4/293/orig 2025-09-07T06:13:37.2499655Z * [new branch] gh/ydwu4/294/base -> origin/gh/ydwu4/294/base 2025-09-07T06:13:37.2500771Z * [new branch] gh/ydwu4/294/head -> origin/gh/ydwu4/294/head 2025-09-07T06:13:37.2502184Z * [new branch] gh/ydwu4/294/orig -> origin/gh/ydwu4/294/orig 2025-09-07T06:13:37.2503882Z * [new branch] gh/ydwu4/295/base -> origin/gh/ydwu4/295/base 2025-09-07T06:13:37.2505131Z * [new branch] gh/ydwu4/295/head -> origin/gh/ydwu4/295/head 2025-09-07T06:13:37.2506242Z * [new branch] gh/ydwu4/295/orig -> origin/gh/ydwu4/295/orig 2025-09-07T06:13:37.2507764Z * [new branch] gh/ydwu4/296/base -> origin/gh/ydwu4/296/base 2025-09-07T06:13:37.2508766Z * [new branch] gh/ydwu4/296/head -> origin/gh/ydwu4/296/head 2025-09-07T06:13:37.2509934Z * [new branch] gh/ydwu4/296/orig -> origin/gh/ydwu4/296/orig 2025-09-07T06:13:37.2512531Z * [new branch] gh/ydwu4/300/base -> origin/gh/ydwu4/300/base 2025-09-07T06:13:37.2514235Z * [new branch] gh/ydwu4/300/head -> origin/gh/ydwu4/300/head 2025-09-07T06:13:37.2515497Z * [new branch] gh/ydwu4/300/orig -> origin/gh/ydwu4/300/orig 2025-09-07T06:13:37.2517453Z * [new branch] gh/ydwu4/301/base -> origin/gh/ydwu4/301/base 2025-09-07T06:13:37.2518452Z * [new branch] gh/ydwu4/301/head -> origin/gh/ydwu4/301/head 2025-09-07T06:13:37.2519716Z * [new branch] gh/ydwu4/301/orig -> origin/gh/ydwu4/301/orig 2025-09-07T06:13:37.2521284Z * [new branch] gh/ydwu4/302/base -> origin/gh/ydwu4/302/base 2025-09-07T06:13:37.2522339Z * [new branch] gh/ydwu4/302/head -> origin/gh/ydwu4/302/head 2025-09-07T06:13:37.2523469Z * [new branch] gh/ydwu4/302/orig -> origin/gh/ydwu4/302/orig 2025-09-07T06:13:37.2525066Z * [new branch] gh/ydwu4/303/base -> origin/gh/ydwu4/303/base 2025-09-07T06:13:37.2526035Z * [new branch] gh/ydwu4/303/head -> origin/gh/ydwu4/303/head 2025-09-07T06:13:37.2527180Z * [new branch] gh/ydwu4/303/orig -> origin/gh/ydwu4/303/orig 2025-09-07T06:13:37.2528636Z * [new branch] gh/ydwu4/304/base -> origin/gh/ydwu4/304/base 2025-09-07T06:13:37.2529713Z * [new branch] gh/ydwu4/304/head -> origin/gh/ydwu4/304/head 2025-09-07T06:13:37.2530836Z * [new branch] gh/ydwu4/304/orig -> origin/gh/ydwu4/304/orig 2025-09-07T06:13:37.2532573Z * [new branch] gh/ydwu4/305/base -> origin/gh/ydwu4/305/base 2025-09-07T06:13:37.2534031Z * [new branch] gh/ydwu4/305/head -> origin/gh/ydwu4/305/head 2025-09-07T06:13:37.2535340Z * [new branch] gh/ydwu4/305/orig -> origin/gh/ydwu4/305/orig 2025-09-07T06:13:37.2537047Z * [new branch] gh/ydwu4/306/base -> origin/gh/ydwu4/306/base 2025-09-07T06:13:37.2538206Z * [new branch] gh/ydwu4/306/head -> origin/gh/ydwu4/306/head 2025-09-07T06:13:37.2539561Z * [new branch] gh/ydwu4/306/orig -> origin/gh/ydwu4/306/orig 2025-09-07T06:13:37.2541089Z * [new branch] gh/ydwu4/307/base -> origin/gh/ydwu4/307/base 2025-09-07T06:13:37.2542133Z * [new branch] gh/ydwu4/307/head -> origin/gh/ydwu4/307/head 2025-09-07T06:13:37.2543270Z * [new branch] gh/ydwu4/307/orig -> origin/gh/ydwu4/307/orig 2025-09-07T06:13:37.2545107Z * [new branch] gh/ydwu4/308/base -> origin/gh/ydwu4/308/base 2025-09-07T06:13:37.2546196Z * [new branch] gh/ydwu4/308/head -> origin/gh/ydwu4/308/head 2025-09-07T06:13:37.2547356Z * [new branch] gh/ydwu4/308/orig -> origin/gh/ydwu4/308/orig 2025-09-07T06:13:37.2548902Z * [new branch] gh/ydwu4/309/base -> origin/gh/ydwu4/309/base 2025-09-07T06:13:37.2549932Z * [new branch] gh/ydwu4/309/head -> origin/gh/ydwu4/309/head 2025-09-07T06:13:37.2551264Z * [new branch] gh/ydwu4/309/orig -> origin/gh/ydwu4/309/orig 2025-09-07T06:13:37.2552977Z * [new branch] gh/ydwu4/310/base -> origin/gh/ydwu4/310/base 2025-09-07T06:13:37.2554365Z * [new branch] gh/ydwu4/310/head -> origin/gh/ydwu4/310/head 2025-09-07T06:13:37.2555428Z * [new branch] gh/ydwu4/310/orig -> origin/gh/ydwu4/310/orig 2025-09-07T06:13:37.2557022Z * [new branch] gh/ydwu4/311/base -> origin/gh/ydwu4/311/base 2025-09-07T06:13:37.2558117Z * [new branch] gh/ydwu4/311/head -> origin/gh/ydwu4/311/head 2025-09-07T06:13:37.2559215Z * [new branch] gh/ydwu4/311/orig -> origin/gh/ydwu4/311/orig 2025-09-07T06:13:37.2560800Z * [new branch] gh/ydwu4/312/base -> origin/gh/ydwu4/312/base 2025-09-07T06:13:37.2561841Z * [new branch] gh/ydwu4/312/head -> origin/gh/ydwu4/312/head 2025-09-07T06:13:37.2562998Z * [new branch] gh/ydwu4/312/orig -> origin/gh/ydwu4/312/orig 2025-09-07T06:13:37.2564900Z * [new branch] gh/ydwu4/313/base -> origin/gh/ydwu4/313/base 2025-09-07T06:13:37.2566079Z * [new branch] gh/ydwu4/313/head -> origin/gh/ydwu4/313/head 2025-09-07T06:13:37.2567368Z * [new branch] gh/ydwu4/313/orig -> origin/gh/ydwu4/313/orig 2025-09-07T06:13:37.2569479Z * [new branch] gh/ydwu4/314/base -> origin/gh/ydwu4/314/base 2025-09-07T06:13:37.2570826Z * [new branch] gh/ydwu4/314/head -> origin/gh/ydwu4/314/head 2025-09-07T06:13:37.2571980Z * [new branch] gh/ydwu4/314/orig -> origin/gh/ydwu4/314/orig 2025-09-07T06:13:37.2574099Z * [new branch] gh/ydwu4/315/base -> origin/gh/ydwu4/315/base 2025-09-07T06:13:37.2575124Z * [new branch] gh/ydwu4/315/head -> origin/gh/ydwu4/315/head 2025-09-07T06:13:37.2576336Z * [new branch] gh/ydwu4/315/orig -> origin/gh/ydwu4/315/orig 2025-09-07T06:13:37.2578103Z * [new branch] gh/ydwu4/316/base -> origin/gh/ydwu4/316/base 2025-09-07T06:13:37.2579273Z * [new branch] gh/ydwu4/316/head -> origin/gh/ydwu4/316/head 2025-09-07T06:13:37.2580441Z * [new branch] gh/ydwu4/316/orig -> origin/gh/ydwu4/316/orig 2025-09-07T06:13:37.2582168Z * [new branch] gh/ydwu4/317/base -> origin/gh/ydwu4/317/base 2025-09-07T06:13:37.2583187Z * [new branch] gh/ydwu4/317/head -> origin/gh/ydwu4/317/head 2025-09-07T06:13:37.2584520Z * [new branch] gh/ydwu4/317/orig -> origin/gh/ydwu4/317/orig 2025-09-07T06:13:37.2586253Z * [new branch] gh/ydwu4/318/base -> origin/gh/ydwu4/318/base 2025-09-07T06:13:37.2587512Z * [new branch] gh/ydwu4/318/head -> origin/gh/ydwu4/318/head 2025-09-07T06:13:37.2588461Z * [new branch] gh/ydwu4/318/orig -> origin/gh/ydwu4/318/orig 2025-09-07T06:13:37.2589929Z * [new branch] gh/ydwu4/319/base -> origin/gh/ydwu4/319/base 2025-09-07T06:13:37.2590985Z * [new branch] gh/ydwu4/319/head -> origin/gh/ydwu4/319/head 2025-09-07T06:13:37.2592432Z * [new branch] gh/ydwu4/319/orig -> origin/gh/ydwu4/319/orig 2025-09-07T06:13:37.2595689Z * [new branch] gh/ydwu4/320/base -> origin/gh/ydwu4/320/base 2025-09-07T06:13:37.2596724Z * [new branch] gh/ydwu4/320/head -> origin/gh/ydwu4/320/head 2025-09-07T06:13:37.2597908Z * [new branch] gh/ydwu4/320/orig -> origin/gh/ydwu4/320/orig 2025-09-07T06:13:37.2599433Z * [new branch] gh/ydwu4/321/base -> origin/gh/ydwu4/321/base 2025-09-07T06:13:37.2600536Z * [new branch] gh/ydwu4/321/head -> origin/gh/ydwu4/321/head 2025-09-07T06:13:37.2601826Z * [new branch] gh/ydwu4/321/orig -> origin/gh/ydwu4/321/orig 2025-09-07T06:13:37.2603429Z * [new branch] gh/ydwu4/322/base -> origin/gh/ydwu4/322/base 2025-09-07T06:13:37.2604569Z * [new branch] gh/ydwu4/322/head -> origin/gh/ydwu4/322/head 2025-09-07T06:13:37.2605774Z * [new branch] gh/ydwu4/322/orig -> origin/gh/ydwu4/322/orig 2025-09-07T06:13:37.2607344Z * [new branch] gh/ydwu4/323/base -> origin/gh/ydwu4/323/base 2025-09-07T06:13:37.2608421Z * [new branch] gh/ydwu4/323/head -> origin/gh/ydwu4/323/head 2025-09-07T06:13:37.2609519Z * [new branch] gh/ydwu4/323/orig -> origin/gh/ydwu4/323/orig 2025-09-07T06:13:37.2611118Z * [new branch] gh/ydwu4/324/base -> origin/gh/ydwu4/324/base 2025-09-07T06:13:37.2612173Z * [new branch] gh/ydwu4/324/head -> origin/gh/ydwu4/324/head 2025-09-07T06:13:37.2613623Z * [new branch] gh/ydwu4/324/orig -> origin/gh/ydwu4/324/orig 2025-09-07T06:13:37.2615577Z * [new branch] gh/yf225/133/base -> origin/gh/yf225/133/base 2025-09-07T06:13:37.2616777Z * [new branch] gh/yf225/133/head -> origin/gh/yf225/133/head 2025-09-07T06:13:37.2618691Z * [new branch] gh/yf225/171/base -> origin/gh/yf225/171/base 2025-09-07T06:13:37.2619892Z * [new branch] gh/yf225/171/head -> origin/gh/yf225/171/head 2025-09-07T06:13:37.2621068Z * [new branch] gh/yf225/171/orig -> origin/gh/yf225/171/orig 2025-09-07T06:13:37.2622753Z * [new branch] gh/yf225/172/base -> origin/gh/yf225/172/base 2025-09-07T06:13:37.2623859Z * [new branch] gh/yf225/172/head -> origin/gh/yf225/172/head 2025-09-07T06:13:37.2624858Z * [new branch] gh/yf225/172/orig -> origin/gh/yf225/172/orig 2025-09-07T06:13:37.2626552Z * [new branch] gh/yf225/93/base -> origin/gh/yf225/93/base 2025-09-07T06:13:37.2627609Z * [new branch] gh/yf225/93/head -> origin/gh/yf225/93/head 2025-09-07T06:13:37.2630104Z * [new branch] gh/yifuwang/152/base -> origin/gh/yifuwang/152/base 2025-09-07T06:13:37.2631502Z * [new branch] gh/yifuwang/152/head -> origin/gh/yifuwang/152/head 2025-09-07T06:13:37.2632811Z * [new branch] gh/yifuwang/152/orig -> origin/gh/yifuwang/152/orig 2025-09-07T06:13:37.2634335Z * [new branch] gh/yifuwang/195/base -> origin/gh/yifuwang/195/base 2025-09-07T06:13:37.2635447Z * [new branch] gh/yifuwang/195/head -> origin/gh/yifuwang/195/head 2025-09-07T06:13:37.2636563Z * [new branch] gh/yifuwang/195/orig -> origin/gh/yifuwang/195/orig 2025-09-07T06:13:37.2638565Z * [new branch] gh/yiming0416/1/base -> origin/gh/yiming0416/1/base 2025-09-07T06:13:37.2639629Z * [new branch] gh/yiming0416/1/head -> origin/gh/yiming0416/1/head 2025-09-07T06:13:37.2641040Z * [new branch] gh/yiming0416/2/base -> origin/gh/yiming0416/2/base 2025-09-07T06:13:37.2642028Z * [new branch] gh/yiming0416/2/head -> origin/gh/yiming0416/2/head 2025-09-07T06:13:37.2644050Z * [new branch] gh/ysiraichi/79/base -> origin/gh/ysiraichi/79/base 2025-09-07T06:13:37.2645626Z * [new branch] gh/ysiraichi/79/head -> origin/gh/ysiraichi/79/head 2025-09-07T06:13:37.2647063Z * [new branch] gh/ysiraichi/79/orig -> origin/gh/ysiraichi/79/orig 2025-09-07T06:13:37.2648606Z * [new branch] gh/ysiraichi/88/base -> origin/gh/ysiraichi/88/base 2025-09-07T06:13:37.2649674Z * [new branch] gh/ysiraichi/88/head -> origin/gh/ysiraichi/88/head 2025-09-07T06:13:37.2650821Z * [new branch] gh/ysiraichi/88/orig -> origin/gh/ysiraichi/88/orig 2025-09-07T06:13:37.2652865Z * [new branch] gh/zhxchen17/25/base -> origin/gh/zhxchen17/25/base 2025-09-07T06:13:37.2654250Z * [new branch] gh/zhxchen17/25/head -> origin/gh/zhxchen17/25/head 2025-09-07T06:13:37.2655868Z * [new branch] gh/zhxchen17/25/orig -> origin/gh/zhxchen17/25/orig 2025-09-07T06:13:37.2657696Z * [new branch] gh/zhxchen17/31/base -> origin/gh/zhxchen17/31/base 2025-09-07T06:13:37.2658855Z * [new branch] gh/zhxchen17/31/head -> origin/gh/zhxchen17/31/head 2025-09-07T06:13:37.2660091Z * [new branch] gh/zhxchen17/31/orig -> origin/gh/zhxchen17/31/orig 2025-09-07T06:13:37.2661716Z * [new branch] gh/zhxchen17/34/base -> origin/gh/zhxchen17/34/base 2025-09-07T06:13:37.2662913Z * [new branch] gh/zhxchen17/34/head -> origin/gh/zhxchen17/34/head 2025-09-07T06:13:37.2664350Z * [new branch] gh/zhxchen17/35/base -> origin/gh/zhxchen17/35/base 2025-09-07T06:13:37.2665522Z * [new branch] gh/zhxchen17/35/head -> origin/gh/zhxchen17/35/head 2025-09-07T06:13:37.2667348Z * [new branch] gh/zhxchen17/37/base -> origin/gh/zhxchen17/37/base 2025-09-07T06:13:37.2668403Z * [new branch] gh/zhxchen17/37/head -> origin/gh/zhxchen17/37/head 2025-09-07T06:13:37.2669574Z * [new branch] gh/zhxchen17/37/orig -> origin/gh/zhxchen17/37/orig 2025-09-07T06:13:37.2671413Z * [new branch] gh/zhxchen17/38/base -> origin/gh/zhxchen17/38/base 2025-09-07T06:13:37.2672429Z * [new branch] gh/zhxchen17/38/head -> origin/gh/zhxchen17/38/head 2025-09-07T06:13:37.2673598Z * [new branch] gh/zhxchen17/38/orig -> origin/gh/zhxchen17/38/orig 2025-09-07T06:13:37.2674986Z * [new branch] gh/zhxchen17/39/base -> origin/gh/zhxchen17/39/base 2025-09-07T06:13:37.2676197Z * [new branch] gh/zhxchen17/39/head -> origin/gh/zhxchen17/39/head 2025-09-07T06:13:37.2677333Z * [new branch] gh/zhxchen17/39/orig -> origin/gh/zhxchen17/39/orig 2025-09-07T06:13:37.2679566Z * [new branch] gh/zhxchen17/40/base -> origin/gh/zhxchen17/40/base 2025-09-07T06:13:37.2680721Z * [new branch] gh/zhxchen17/40/head -> origin/gh/zhxchen17/40/head 2025-09-07T06:13:37.2681993Z * [new branch] gh/zhxchen17/40/orig -> origin/gh/zhxchen17/40/orig 2025-09-07T06:13:37.2683616Z * [new branch] gh/zhxchen17/41/base -> origin/gh/zhxchen17/41/base 2025-09-07T06:13:37.2684784Z * [new branch] gh/zhxchen17/41/head -> origin/gh/zhxchen17/41/head 2025-09-07T06:13:37.2686246Z * [new branch] gh/zhxchen17/41/orig -> origin/gh/zhxchen17/41/orig 2025-09-07T06:13:37.2687981Z * [new branch] gh/zhxchen17/42/base -> origin/gh/zhxchen17/42/base 2025-09-07T06:13:37.2689201Z * [new branch] gh/zhxchen17/42/head -> origin/gh/zhxchen17/42/head 2025-09-07T06:13:37.2690549Z * [new branch] gh/zhxchen17/42/orig -> origin/gh/zhxchen17/42/orig 2025-09-07T06:13:37.2692440Z * [new branch] gh/zhxchen17/43/base -> origin/gh/zhxchen17/43/base 2025-09-07T06:13:37.2694206Z * [new branch] gh/zhxchen17/43/head -> origin/gh/zhxchen17/43/head 2025-09-07T06:13:37.2695873Z * [new branch] gh/zhxchen17/43/orig -> origin/gh/zhxchen17/43/orig 2025-09-07T06:13:37.2697757Z * [new branch] gh/zhxchen17/44/base -> origin/gh/zhxchen17/44/base 2025-09-07T06:13:37.2698835Z * [new branch] gh/zhxchen17/44/head -> origin/gh/zhxchen17/44/head 2025-09-07T06:13:37.2700067Z * [new branch] gh/zhxchen17/44/orig -> origin/gh/zhxchen17/44/orig 2025-09-07T06:13:37.2701661Z * [new branch] gh/zhxchen17/45/base -> origin/gh/zhxchen17/45/base 2025-09-07T06:13:37.2702855Z * [new branch] gh/zhxchen17/45/head -> origin/gh/zhxchen17/45/head 2025-09-07T06:13:37.2704188Z * [new branch] gh/zhxchen17/45/orig -> origin/gh/zhxchen17/45/orig 2025-09-07T06:13:37.2706127Z * [new branch] gh/zklaus/10/base -> origin/gh/zklaus/10/base 2025-09-07T06:13:37.2707237Z * [new branch] gh/zklaus/10/head -> origin/gh/zklaus/10/head 2025-09-07T06:13:37.2708351Z * [new branch] gh/zklaus/10/orig -> origin/gh/zklaus/10/orig 2025-09-07T06:13:37.2710336Z * [new branch] gh/zklaus/11/base -> origin/gh/zklaus/11/base 2025-09-07T06:13:37.2711413Z * [new branch] gh/zklaus/11/head -> origin/gh/zklaus/11/head 2025-09-07T06:13:37.2712693Z * [new branch] gh/zklaus/11/orig -> origin/gh/zklaus/11/orig 2025-09-07T06:13:37.2714189Z * [new branch] gh/zklaus/12/base -> origin/gh/zklaus/12/base 2025-09-07T06:13:37.2715277Z * [new branch] gh/zklaus/12/head -> origin/gh/zklaus/12/head 2025-09-07T06:13:37.2716540Z * [new branch] gh/zklaus/12/orig -> origin/gh/zklaus/12/orig 2025-09-07T06:13:37.2718117Z * [new branch] gh/zklaus/14/base -> origin/gh/zklaus/14/base 2025-09-07T06:13:37.2719190Z * [new branch] gh/zklaus/14/head -> origin/gh/zklaus/14/head 2025-09-07T06:13:37.2720293Z * [new branch] gh/zklaus/14/orig -> origin/gh/zklaus/14/orig 2025-09-07T06:13:37.2721887Z * [new branch] gh/zklaus/15/base -> origin/gh/zklaus/15/base 2025-09-07T06:13:37.2722962Z * [new branch] gh/zklaus/15/head -> origin/gh/zklaus/15/head 2025-09-07T06:13:37.2724203Z * [new branch] gh/zklaus/15/orig -> origin/gh/zklaus/15/orig 2025-09-07T06:13:37.2725786Z * [new branch] gh/zklaus/16/base -> origin/gh/zklaus/16/base 2025-09-07T06:13:37.2726814Z * [new branch] gh/zklaus/16/head -> origin/gh/zklaus/16/head 2025-09-07T06:13:37.2728077Z * [new branch] gh/zklaus/16/orig -> origin/gh/zklaus/16/orig 2025-09-07T06:13:37.2729651Z * [new branch] gh/zklaus/17/base -> origin/gh/zklaus/17/base 2025-09-07T06:13:37.2730711Z * [new branch] gh/zklaus/17/head -> origin/gh/zklaus/17/head 2025-09-07T06:13:37.2731874Z * [new branch] gh/zklaus/17/orig -> origin/gh/zklaus/17/orig 2025-09-07T06:13:37.2733675Z * [new branch] gh/zklaus/18/base -> origin/gh/zklaus/18/base 2025-09-07T06:13:37.2734778Z * [new branch] gh/zklaus/18/head -> origin/gh/zklaus/18/head 2025-09-07T06:13:37.2735946Z * [new branch] gh/zklaus/18/orig -> origin/gh/zklaus/18/orig 2025-09-07T06:13:37.2737510Z * [new branch] gh/zklaus/19/base -> origin/gh/zklaus/19/base 2025-09-07T06:13:37.2738645Z * [new branch] gh/zklaus/19/head -> origin/gh/zklaus/19/head 2025-09-07T06:13:37.2739759Z * [new branch] gh/zklaus/19/orig -> origin/gh/zklaus/19/orig 2025-09-07T06:13:37.2741387Z * [new branch] gh/zklaus/20/base -> origin/gh/zklaus/20/base 2025-09-07T06:13:37.2742468Z * [new branch] gh/zklaus/20/head -> origin/gh/zklaus/20/head 2025-09-07T06:13:37.2743816Z * [new branch] gh/zklaus/20/orig -> origin/gh/zklaus/20/orig 2025-09-07T06:13:37.2745561Z * [new branch] gh/zklaus/7/base -> origin/gh/zklaus/7/base 2025-09-07T06:13:37.2746659Z * [new branch] gh/zklaus/7/head -> origin/gh/zklaus/7/head 2025-09-07T06:13:37.2747740Z * [new branch] gh/zklaus/7/orig -> origin/gh/zklaus/7/orig 2025-09-07T06:13:37.2749210Z * [new branch] gh/zklaus/9/base -> origin/gh/zklaus/9/base 2025-09-07T06:13:37.2750275Z * [new branch] gh/zklaus/9/head -> origin/gh/zklaus/9/head 2025-09-07T06:13:37.2751379Z * [new branch] gh/zklaus/9/orig -> origin/gh/zklaus/9/orig 2025-09-07T06:13:37.2753259Z * [new branch] gh/zou3519/1175/base -> origin/gh/zou3519/1175/base 2025-09-07T06:13:37.2754306Z * [new branch] gh/zou3519/1175/head -> origin/gh/zou3519/1175/head 2025-09-07T06:13:37.2755487Z * [new branch] gh/zou3519/1175/orig -> origin/gh/zou3519/1175/orig 2025-09-07T06:13:37.2757098Z * [new branch] gh/zou3519/1177/base -> origin/gh/zou3519/1177/base 2025-09-07T06:13:37.2758263Z * [new branch] gh/zou3519/1177/head -> origin/gh/zou3519/1177/head 2025-09-07T06:13:37.2759402Z * [new branch] gh/zou3519/1177/orig -> origin/gh/zou3519/1177/orig 2025-09-07T06:13:37.2761484Z * [new branch] gh/zou3519/1191/base -> origin/gh/zou3519/1191/base 2025-09-07T06:13:37.2762794Z * [new branch] gh/zou3519/1191/head -> origin/gh/zou3519/1191/head 2025-09-07T06:13:37.2763923Z * [new branch] gh/zou3519/1191/orig -> origin/gh/zou3519/1191/orig 2025-09-07T06:13:37.2765580Z * [new branch] gh/zou3519/1192/base -> origin/gh/zou3519/1192/base 2025-09-07T06:13:37.2766674Z * [new branch] gh/zou3519/1192/head -> origin/gh/zou3519/1192/head 2025-09-07T06:13:37.2767835Z * [new branch] gh/zou3519/1192/orig -> origin/gh/zou3519/1192/orig 2025-09-07T06:13:37.2769238Z * [new branch] gh/zou3519/1193/base -> origin/gh/zou3519/1193/base 2025-09-07T06:13:37.2770370Z * [new branch] gh/zou3519/1193/head -> origin/gh/zou3519/1193/head 2025-09-07T06:13:37.2771403Z * [new branch] gh/zou3519/1193/orig -> origin/gh/zou3519/1193/orig 2025-09-07T06:13:37.2772909Z * [new branch] gh/zou3519/1194/base -> origin/gh/zou3519/1194/base 2025-09-07T06:13:37.2774559Z * [new branch] gh/zou3519/1194/head -> origin/gh/zou3519/1194/head 2025-09-07T06:13:37.2775658Z * [new branch] gh/zou3519/1194/orig -> origin/gh/zou3519/1194/orig 2025-09-07T06:13:37.2777331Z * [new branch] gh/zou3519/1195/base -> origin/gh/zou3519/1195/base 2025-09-07T06:13:37.2778559Z * [new branch] gh/zou3519/1195/head -> origin/gh/zou3519/1195/head 2025-09-07T06:13:37.2779775Z * [new branch] gh/zou3519/1195/orig -> origin/gh/zou3519/1195/orig 2025-09-07T06:13:37.2781226Z * [new branch] gh/zou3519/1196/base -> origin/gh/zou3519/1196/base 2025-09-07T06:13:37.2782424Z * [new branch] gh/zou3519/1196/head -> origin/gh/zou3519/1196/head 2025-09-07T06:13:37.2783789Z * [new branch] gh/zou3519/1196/orig -> origin/gh/zou3519/1196/orig 2025-09-07T06:13:37.2785380Z * [new branch] gh/zou3519/1197/base -> origin/gh/zou3519/1197/base 2025-09-07T06:13:37.2786482Z * [new branch] gh/zou3519/1197/head -> origin/gh/zou3519/1197/head 2025-09-07T06:13:37.2787671Z * [new branch] gh/zou3519/1197/orig -> origin/gh/zou3519/1197/orig 2025-09-07T06:13:37.2789736Z * [new branch] gh/zpcore/1/base -> origin/gh/zpcore/1/base 2025-09-07T06:13:37.2790785Z * [new branch] gh/zpcore/1/head -> origin/gh/zpcore/1/head 2025-09-07T06:13:37.2793315Z * [new branch] gh/zpcore/10/base -> origin/gh/zpcore/10/base 2025-09-07T06:13:37.2794281Z * [new branch] gh/zpcore/10/head -> origin/gh/zpcore/10/head 2025-09-07T06:13:37.2795437Z * [new branch] gh/zpcore/10/orig -> origin/gh/zpcore/10/orig 2025-09-07T06:13:37.2797145Z * [new branch] gh/zpcore/11/base -> origin/gh/zpcore/11/base 2025-09-07T06:13:37.2798311Z * [new branch] gh/zpcore/11/head -> origin/gh/zpcore/11/head 2025-09-07T06:13:37.2799495Z * [new branch] gh/zpcore/11/orig -> origin/gh/zpcore/11/orig 2025-09-07T06:13:37.2801338Z * [new branch] gh/zpcore/12/base -> origin/gh/zpcore/12/base 2025-09-07T06:13:37.2803170Z * [new branch] gh/zpcore/12/head -> origin/gh/zpcore/12/head 2025-09-07T06:13:37.2804346Z * [new branch] gh/zpcore/12/orig -> origin/gh/zpcore/12/orig 2025-09-07T06:13:37.2806199Z * [new branch] gh/zpcore/13/base -> origin/gh/zpcore/13/base 2025-09-07T06:13:37.2807370Z * [new branch] gh/zpcore/13/head -> origin/gh/zpcore/13/head 2025-09-07T06:13:37.2808514Z * [new branch] gh/zpcore/13/orig -> origin/gh/zpcore/13/orig 2025-09-07T06:13:37.2810140Z * [new branch] gh/zpcore/14/base -> origin/gh/zpcore/14/base 2025-09-07T06:13:37.2811271Z * [new branch] gh/zpcore/14/head -> origin/gh/zpcore/14/head 2025-09-07T06:13:37.2812877Z * [new branch] gh/zpcore/2/base -> origin/gh/zpcore/2/base 2025-09-07T06:13:37.2814232Z * [new branch] gh/zpcore/2/head -> origin/gh/zpcore/2/head 2025-09-07T06:13:37.2815664Z * [new branch] gh/zpcore/3/base -> origin/gh/zpcore/3/base 2025-09-07T06:13:37.2816716Z * [new branch] gh/zpcore/3/head -> origin/gh/zpcore/3/head 2025-09-07T06:13:37.2818163Z * [new branch] gh/zpcore/4/base -> origin/gh/zpcore/4/base 2025-09-07T06:13:37.2819283Z * [new branch] gh/zpcore/4/head -> origin/gh/zpcore/4/head 2025-09-07T06:13:37.2820854Z * [new branch] gh/zpcore/5/base -> origin/gh/zpcore/5/base 2025-09-07T06:13:37.2822011Z * [new branch] gh/zpcore/5/head -> origin/gh/zpcore/5/head 2025-09-07T06:13:37.2823631Z * [new branch] gh/zpcore/6/base -> origin/gh/zpcore/6/base 2025-09-07T06:13:37.2824541Z * [new branch] gh/zpcore/6/head -> origin/gh/zpcore/6/head 2025-09-07T06:13:37.2826084Z * [new branch] gh/zpcore/7/base -> origin/gh/zpcore/7/base 2025-09-07T06:13:37.2827089Z * [new branch] gh/zpcore/7/head -> origin/gh/zpcore/7/head 2025-09-07T06:13:37.2828492Z * [new branch] gh/zpcore/8/base -> origin/gh/zpcore/8/base 2025-09-07T06:13:37.2829560Z * [new branch] gh/zpcore/8/head -> origin/gh/zpcore/8/head 2025-09-07T06:13:37.2830959Z * [new branch] google-main -> origin/google-main 2025-09-07T06:13:37.2832620Z * [new branch] guangyey/external_stream -> origin/guangyey/external_stream 2025-09-07T06:13:37.2833633Z * [new branch] guangyey/host_alloc -> origin/guangyey/host_alloc 2025-09-07T06:13:37.2834583Z * [new branch] guangyey/reimport -> origin/guangyey/reimport 2025-09-07T06:13:37.2835706Z * [new branch] guangyey/test_2025 -> origin/guangyey/test_2025 2025-09-07T06:13:37.2837689Z * [new branch] guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9 2025-09-07T06:13:37.2839000Z * [new branch] haozhe/bf16-dynamic-shape -> origin/haozhe/bf16-dynamic-shape 2025-09-07T06:13:37.2840138Z * [new branch] hc_baseline -> origin/hc_baseline 2025-09-07T06:13:37.2841424Z * [new branch] hf_update -> origin/hf_update 2025-09-07T06:13:37.2842553Z * [new branch] hhh_decomp_mul -> origin/hhh_decomp_mul 2025-09-07T06:13:37.2843679Z * [new branch] hhh_rand -> origin/hhh_rand 2025-09-07T06:13:37.2845233Z * [new branch] hoy/mmsplitk -> origin/hoy/mmsplitk 2025-09-07T06:13:37.2846291Z * [new branch] hoy/triton-PR3973 -> origin/hoy/triton-PR3973 2025-09-07T06:13:37.2847539Z * [new branch] hoy/triton-coalescing-baseline -> origin/hoy/triton-coalescing-baseline 2025-09-07T06:13:37.2848541Z * [new branch] hoy/triton-coalescing-new -> origin/hoy/triton-coalescing-new 2025-09-07T06:13:37.2849630Z * [new branch] hoy/triton-coalescing-vec -> origin/hoy/triton-coalescing-vec 2025-09-07T06:13:37.2850989Z * [new branch] inductordecompfix -> origin/inductordecompfix 2025-09-07T06:13:37.2852097Z * [new branch] inline -> origin/inline 2025-09-07T06:13:37.2853763Z * [new branch] inlining -> origin/inlining 2025-09-07T06:13:37.2855049Z * [new branch] inlining-ezyang -> origin/inlining-ezyang 2025-09-07T06:13:37.2856262Z * [new branch] install-torchao-0.13.0 -> origin/install-torchao-0.13.0 2025-09-07T06:13:37.2857378Z * [new branch] int8_sdpa -> origin/int8_sdpa 2025-09-07T06:13:37.2858574Z * [new branch] invoke-subgraph -> origin/invoke-subgraph 2025-09-07T06:13:37.2859952Z * [new branch] issue#58739 -> origin/issue#58739 2025-09-07T06:13:37.2861776Z * [new branch] jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2 2025-09-07T06:13:37.2862796Z * [new branch] jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2 2025-09-07T06:13:37.2864437Z * [new branch] jeanschmidt/disable_rocm_build_tests -> origin/jeanschmidt/disable_rocm_build_tests 2025-09-07T06:13:37.2865828Z * [new branch] jithunnair-amd-patch-1 -> origin/jithunnair-amd-patch-1 2025-09-07T06:13:37.2866936Z * [new branch] jithunnair-amd-patch-2 -> origin/jithunnair-amd-patch-2 2025-09-07T06:13:37.2868989Z * [new branch] justinchu/attention-tests -> origin/justinchu/attention-tests 2025-09-07T06:13:37.2870054Z * [new branch] justinchu/native-qdq -> origin/justinchu/native-qdq 2025-09-07T06:13:37.2871289Z * [new branch] justinchu/ort-122 -> origin/justinchu/ort-122 2025-09-07T06:13:37.2872890Z * [new branch] justinchuby/dynamo-true -> origin/justinchuby/dynamo-true 2025-09-07T06:13:37.2874434Z * [new branch] kainan666/xlf_debug -> origin/kainan666/xlf_debug 2025-09-07T06:13:37.2875526Z * [new branch] kainan_test -> origin/kainan_test 2025-09-07T06:13:37.2876674Z * [new branch] learnablebias -> origin/learnablebias 2025-09-07T06:13:37.2878359Z * [new branch] leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues 2025-09-07T06:13:37.2879869Z * [new branch] lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error 2025-09-07T06:13:37.2881242Z * [new branch] liaoxuan/shm_all_reduce -> origin/liaoxuan/shm_all_reduce 2025-09-07T06:13:37.2882389Z * [new branch] liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax 2025-09-07T06:13:37.2883393Z * [new branch] liaoxuan/test_int8_sdpa -> origin/liaoxuan/test_int8_sdpa 2025-09-07T06:13:37.2884516Z * [new branch] lintbuilddocker -> origin/lintbuilddocker 2025-09-07T06:13:37.2885636Z * [new branch] llama4-stable -> origin/llama4-stable 2025-09-07T06:13:37.2886943Z * [new branch] logdetfix -> origin/logdetfix 2025-09-07T06:13:37.2888932Z * [new branch] lts/release/1.8 -> origin/lts/release/1.8 2025-09-07T06:13:37.2890488Z * [new branch] lucaskabela/#94773 -> origin/lucaskabela/#94773 2025-09-07T06:13:37.2891521Z * [new branch] lucaskabela/flop_counter -> origin/lucaskabela/flop_counter 2025-09-07T06:13:37.2893112Z * [new branch] lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp 2025-09-07T06:13:37.2894401Z * [new branch] lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo 2025-09-07T06:13:37.2895633Z * [new branch] lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr 2025-09-07T06:13:37.2896595Z * [new branch] lucaskabela/issue_120648 -> origin/lucaskabela/issue_120648 2025-09-07T06:13:37.2898094Z * [new branch] lucaskabela/misc_typing_dynamo -> origin/lucaskabela/misc_typing_dynamo 2025-09-07T06:13:37.2899812Z * [new branch] lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr 2025-09-07T06:13:37.2901005Z * [new branch] lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata 2025-09-07T06:13:37.2902016Z * [new branch] lucaskabela/rnn_decomp -> origin/lucaskabela/rnn_decomp 2025-09-07T06:13:37.2903259Z * [new branch] lucaskabela/typing_backends -> origin/lucaskabela/typing_backends 2025-09-07T06:13:37.2904584Z * [new branch] lucaskabela/typing_symbolic_convert -> origin/lucaskabela/typing_symbolic_convert 2025-09-07T06:13:37.2905746Z * [new branch] lucaskabela/typing_utils_improvements -> origin/lucaskabela/typing_utils_improvements 2025-09-07T06:13:37.2906915Z * [new branch] main -> origin/main 2025-09-07T06:13:37.2908425Z * [new branch] main-enable-b200-distributed-tests -> origin/main-enable-b200-distributed-tests 2025-09-07T06:13:37.2909539Z * [new branch] malfet-patch-1 -> origin/malfet-patch-1 2025-09-07T06:13:37.2910838Z * [new branch] malfet-patch-12 -> origin/malfet-patch-12 2025-09-07T06:13:37.2912020Z * [new branch] malfet-patch-14 -> origin/malfet-patch-14 2025-09-07T06:13:37.2913326Z * [new branch] malfet-patch-6 -> origin/malfet-patch-6 2025-09-07T06:13:37.2914615Z * [new branch] malfet-patch-8 -> origin/malfet-patch-8 2025-09-07T06:13:37.2916503Z * [new branch] malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch 2025-09-07T06:13:37.2917456Z * [new branch] malfet/delete-upsteam-cuda -> origin/malfet/delete-upsteam-cuda 2025-09-07T06:13:37.2918594Z * [new branch] malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im 2025-09-07T06:13:37.2920240Z * [new branch] manuel/test-ops-common-allow-mps -> origin/manuel/test-ops-common-allow-mps 2025-09-07T06:13:37.2921399Z * [new branch] metascroy-patch-1 -> origin/metascroy-patch-1 2025-09-07T06:13:37.2923054Z * [new branch] mlazos/S429861-debug -> origin/mlazos/S429861-debug 2025-09-07T06:13:37.2924095Z * [new branch] mlazos/aa -> origin/mlazos/aa 2025-09-07T06:13:37.2925175Z * [new branch] mlazos/arg-renames -> origin/mlazos/arg-renames 2025-09-07T06:13:37.2926273Z * [new branch] mlazos/backup-test-branch -> origin/mlazos/backup-test-branch 2025-09-07T06:13:37.2927295Z * [new branch] mlazos/bad-cudagraphs -> origin/mlazos/bad-cudagraphs 2025-09-07T06:13:37.2928475Z * [new branch] mlazos/baseline -> origin/mlazos/baseline 2025-09-07T06:13:37.2929509Z * [new branch] mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks 2025-09-07T06:13:37.2930538Z * [new branch] mlazos/beta-tensor -> origin/mlazos/beta-tensor 2025-09-07T06:13:37.2931934Z * [new branch] mlazos/better-msg -> origin/mlazos/better-msg 2025-09-07T06:13:37.2933743Z * [new branch] mlazos/buffers -> origin/mlazos/buffers 2025-09-07T06:13:37.2934727Z * [new branch] mlazos/buffers2 -> origin/mlazos/buffers2 2025-09-07T06:13:37.2936053Z * [new branch] mlazos/buffers3 -> origin/mlazos/buffers3 2025-09-07T06:13:37.2937694Z * [new branch] mlazos/ck2 -> origin/mlazos/ck2 2025-09-07T06:13:37.2939081Z * [new branch] mlazos/combokernels -> origin/mlazos/combokernels 2025-09-07T06:13:37.2940233Z * [new branch] mlazos/ctx-cleanup -> origin/mlazos/ctx-cleanup 2025-09-07T06:13:37.2941258Z * [new branch] mlazos/cuda-cmd-log -> origin/mlazos/cuda-cmd-log 2025-09-07T06:13:37.2942631Z * [new branch] mlazos/cudagraph-tests -> origin/mlazos/cudagraph-tests 2025-09-07T06:13:37.2943802Z * [new branch] mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement 2025-09-07T06:13:37.2945085Z * [new branch] mlazos/cutlass-test -> origin/mlazos/cutlass-test 2025-09-07T06:13:37.2946223Z * [new branch] mlazos/cutlass-topo-bug -> origin/mlazos/cutlass-topo-bug 2025-09-07T06:13:37.2947358Z * [new branch] mlazos/data-gather -> origin/mlazos/data-gather 2025-09-07T06:13:37.2948975Z * [new branch] mlazos/data-ptrs2 -> origin/mlazos/data-ptrs2 2025-09-07T06:13:37.2949998Z * [new branch] mlazos/data-ptrs3 -> origin/mlazos/data-ptrs3 2025-09-07T06:13:37.2951160Z * [new branch] mlazos/dataclass-proxy -> origin/mlazos/dataclass-proxy 2025-09-07T06:13:37.2952246Z * [new branch] mlazos/dc-attrs -> origin/mlazos/dc-attrs 2025-09-07T06:13:37.2953458Z * [new branch] mlazos/dc-helion -> origin/mlazos/dc-helion 2025-09-07T06:13:37.2954384Z * [new branch] mlazos/dict-fix -> origin/mlazos/dict-fix 2025-09-07T06:13:37.2955600Z * [new branch] mlazos/disable-closures -> origin/mlazos/disable-closures 2025-09-07T06:13:37.2956677Z * [new branch] mlazos/disable-tf -> origin/mlazos/disable-tf 2025-09-07T06:13:37.2957660Z * [new branch] mlazos/dupe-fix -> origin/mlazos/dupe-fix 2025-09-07T06:13:37.2959122Z * [new branch] mlazos/dyn-batch -> origin/mlazos/dyn-batch 2025-09-07T06:13:37.2960179Z * [new branch] mlazos/evt -> origin/mlazos/evt 2025-09-07T06:13:37.2961459Z * [new branch] mlazos/exp_disable -> origin/mlazos/exp_disable 2025-09-07T06:13:37.2962570Z * [new branch] mlazos/extract-examples -> origin/mlazos/extract-examples 2025-09-07T06:13:37.2963665Z * [new branch] mlazos/foreach-op -> origin/mlazos/foreach-op 2025-09-07T06:13:37.2964723Z * [new branch] mlazos/fp8 -> origin/mlazos/fp8 2025-09-07T06:13:37.2965953Z * [new branch] mlazos/fp8-bias -> origin/mlazos/fp8-bias 2025-09-07T06:13:37.2967082Z * [new branch] mlazos/fp8-bias-fusion -> origin/mlazos/fp8-bias-fusion 2025-09-07T06:13:37.2968014Z * [new branch] mlazos/fp8-fixes -> origin/mlazos/fp8-fixes 2025-09-07T06:13:37.2969117Z * [new branch] mlazos/freezing -> origin/mlazos/freezing 2025-09-07T06:13:37.2970275Z * [new branch] mlazos/h-comp -> origin/mlazos/h-comp 2025-09-07T06:13:37.2972032Z * [new branch] mlazos/h-comp2 -> origin/mlazos/h-comp2 2025-09-07T06:13:37.2973429Z * [new branch] mlazos/hash-hop -> origin/mlazos/hash-hop 2025-09-07T06:13:37.2974634Z * [new branch] mlazos/hc -> origin/mlazos/hc 2025-09-07T06:13:37.2975834Z * [new branch] mlazos/hc-cycles -> origin/mlazos/hc-cycles 2025-09-07T06:13:37.2976988Z * [new branch] mlazos/hc-fixes -> origin/mlazos/hc-fixes 2025-09-07T06:13:37.2978212Z * [new branch] mlazos/hc-fixes3 -> origin/mlazos/hc-fixes3 2025-09-07T06:13:37.2979375Z * [new branch] mlazos/hc-fixes4 -> origin/mlazos/hc-fixes4 2025-09-07T06:13:37.2980680Z * [new branch] mlazos/hc-hf -> origin/mlazos/hc-hf 2025-09-07T06:13:37.2981768Z * [new branch] mlazos/hc-mut -> origin/mlazos/hc-mut 2025-09-07T06:13:37.2983066Z * [new branch] mlazos/hc10 -> origin/mlazos/hc10 2025-09-07T06:13:37.2984124Z * [new branch] mlazos/hc11 -> origin/mlazos/hc11 2025-09-07T06:13:37.2985373Z * [new branch] mlazos/hc12 -> origin/mlazos/hc12 2025-09-07T06:13:37.2986619Z * [new branch] mlazos/hc13 -> origin/mlazos/hc13 2025-09-07T06:13:37.2987631Z * [new branch] mlazos/hc14 -> origin/mlazos/hc14 2025-09-07T06:13:37.2988750Z * [new branch] mlazos/hc15 -> origin/mlazos/hc15 2025-09-07T06:13:37.2990005Z * [new branch] mlazos/hc2 -> origin/mlazos/hc2 2025-09-07T06:13:37.2991045Z * [new branch] mlazos/hc4 -> origin/mlazos/hc4 2025-09-07T06:13:37.2992308Z * [new branch] mlazos/hc5 -> origin/mlazos/hc5 2025-09-07T06:13:37.2993796Z * [new branch] mlazos/hc6 -> origin/mlazos/hc6 2025-09-07T06:13:37.2994958Z * [new branch] mlazos/hc7 -> origin/mlazos/hc7 2025-09-07T06:13:37.2996015Z * [new branch] mlazos/hc8 -> origin/mlazos/hc8 2025-09-07T06:13:37.2997447Z * [new branch] mlazos/hc9 -> origin/mlazos/hc9 2025-09-07T06:13:37.2998512Z * [new branch] mlazos/hc_baseline2 -> origin/mlazos/hc_baseline2 2025-09-07T06:13:37.2999690Z * [new branch] mlazos/init-per-param -> origin/mlazos/init-per-param 2025-09-07T06:13:37.3000807Z * [new branch] mlazos/init_per_param -> origin/mlazos/init_per_param 2025-09-07T06:13:37.3001968Z * [new branch] mlazos/less-guards -> origin/mlazos/less-guards 2025-09-07T06:13:37.3003169Z * [new branch] mlazos/lr-composibility -> origin/mlazos/lr-composibility 2025-09-07T06:13:37.3004135Z * [new branch] mlazos/main -> origin/mlazos/main 2025-09-07T06:13:37.3005604Z * [new branch] mlazos/main-test-enablement -> origin/mlazos/main-test-enablement 2025-09-07T06:13:37.3006613Z * [new branch] mlazos/main2 -> origin/mlazos/main2 2025-09-07T06:13:37.3007816Z * [new branch] mlazos/mark-static-update -> origin/mlazos/mark-static-update 2025-09-07T06:13:37.3008869Z * [new branch] mlazos/mcg -> origin/mlazos/mcg 2025-09-07T06:13:37.3009987Z * [new branch] mlazos/mcg2 -> origin/mlazos/mcg2 2025-09-07T06:13:37.3011146Z * [new branch] mlazos/meta-guards -> origin/mlazos/meta-guards 2025-09-07T06:13:37.3012584Z * [new branch] mlazos/mlazos/ck2 -> origin/mlazos/mlazos/ck2 2025-09-07T06:13:37.3014267Z * [new branch] mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam 2025-09-07T06:13:37.3015467Z * [new branch] mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup 2025-09-07T06:13:37.3016575Z * [new branch] mlazos/mod-fix -> origin/mlazos/mod-fix 2025-09-07T06:13:37.3017774Z * [new branch] mlazos/mode-fix -> origin/mlazos/mode-fix 2025-09-07T06:13:37.3018978Z * [new branch] mlazos/more-tests -> origin/mlazos/more-tests 2025-09-07T06:13:37.3020094Z * [new branch] mlazos/no-cpp -> origin/mlazos/no-cpp 2025-09-07T06:13:37.3021446Z * [new branch] mlazos/no-init-group-handling -> origin/mlazos/no-init-group-handling 2025-09-07T06:13:37.3022531Z * [new branch] mlazos/offsets -> origin/mlazos/offsets 2025-09-07T06:13:37.3023726Z * [new branch] mlazos/opt-bench-exp2 -> origin/mlazos/opt-bench-exp2 2025-09-07T06:13:37.3024811Z * [new branch] mlazos/opt-incr -> origin/mlazos/opt-incr 2025-09-07T06:13:37.3026108Z * [new branch] mlazos/proxy-ctors -> origin/mlazos/proxy-ctors 2025-09-07T06:13:37.3027256Z * [new branch] mlazos/quant-fix -> origin/mlazos/quant-fix 2025-09-07T06:13:37.3028868Z * [new branch] mlazos/resnet-fix -> origin/mlazos/resnet-fix 2025-09-07T06:13:37.3030054Z * [new branch] mlazos/revert-inline -> origin/mlazos/revert-inline 2025-09-07T06:13:37.3031188Z * [new branch] mlazos/rm-buf-names -> origin/mlazos/rm-buf-names 2025-09-07T06:13:37.3032173Z * [new branch] mlazos/rm-code -> origin/mlazos/rm-code 2025-09-07T06:13:37.3033315Z * [new branch] mlazos/rm-spam -> origin/mlazos/rm-spam 2025-09-07T06:13:37.3034536Z * [new branch] mlazos/rtp -> origin/mlazos/rtp 2025-09-07T06:13:37.3035723Z * [new branch] mlazos/static-idx-dbg -> origin/mlazos/static-idx-dbg 2025-09-07T06:13:37.3036895Z * [new branch] mlazos/static-inputs-log -> origin/mlazos/static-inputs-log 2025-09-07T06:13:37.3038036Z * [new branch] mlazos/sub-param-fix -> origin/mlazos/sub-param-fix 2025-09-07T06:13:37.3039140Z * [new branch] mlazos/td-fix2 -> origin/mlazos/td-fix2 2025-09-07T06:13:37.3040394Z * [new branch] mlazos/tensor-hasattr2 -> origin/mlazos/tensor-hasattr2 2025-09-07T06:13:37.3041553Z * [new branch] mlazos/test -> origin/mlazos/test 2025-09-07T06:13:37.3042710Z * [new branch] mlazos/tf-mode -> origin/mlazos/tf-mode 2025-09-07T06:13:37.3043934Z * [new branch] mlazos/tf-mode-backup2 -> origin/mlazos/tf-mode-backup2 2025-09-07T06:13:37.3045075Z * [new branch] mlazos/tf-mode-reland -> origin/mlazos/tf-mode-reland 2025-09-07T06:13:37.3046381Z * [new branch] mlazos/tf-mode-reland2 -> origin/mlazos/tf-mode-reland2 2025-09-07T06:13:37.3047504Z * [new branch] mlazos/tf-mode-reland3 -> origin/mlazos/tf-mode-reland3 2025-09-07T06:13:37.3048623Z * [new branch] mlazos/topo-fix -> origin/mlazos/topo-fix 2025-09-07T06:13:37.3049778Z * [new branch] mlazos/triton-no-epi -> origin/mlazos/triton-no-epi 2025-09-07T06:13:37.3050851Z * [new branch] mlazos/tune-proto -> origin/mlazos/tune-proto 2025-09-07T06:13:37.3052345Z * [new branch] mlazos/tuple-fixes -> origin/mlazos/tuple-fixes 2025-09-07T06:13:37.3053834Z * [new branch] mlazos/tuple-fixes2 -> origin/mlazos/tuple-fixes2 2025-09-07T06:13:37.3055043Z * [new branch] mlazos/tuple-handling -> origin/mlazos/tuple-handling 2025-09-07T06:13:37.3056174Z * [new branch] mlazos/user-streams -> origin/mlazos/user-streams 2025-09-07T06:13:37.3057347Z * [new branch] mlazos/vary-beta -> origin/mlazos/vary-beta 2025-09-07T06:13:37.3058531Z * [new branch] mlazos/vary-beta2 -> origin/mlazos/vary-beta2 2025-09-07T06:13:37.3059708Z * [new branch] mlazos/weird-perf1 -> origin/mlazos/weird-perf1 2025-09-07T06:13:37.3061045Z * [new branch] mm_out_dtype_compile -> origin/mm_out_dtype_compile 2025-09-07T06:13:37.3062310Z * [new branch] modify-setupvllm -> origin/modify-setupvllm 2025-09-07T06:13:37.3063521Z * [new branch] module-shim -> origin/module-shim 2025-09-07T06:13:37.3064803Z * [new branch] move-theme-out-docker -> origin/move-theme-out-docker 2025-09-07T06:13:37.3066557Z * [new branch] msaroufim/be1 -> origin/msaroufim/be1 2025-09-07T06:13:37.3067722Z * [new branch] msaroufim/cn_path -> origin/msaroufim/cn_path 2025-09-07T06:13:37.3068943Z * [new branch] msaroufim/dtensorfusedadam -> origin/msaroufim/dtensorfusedadam 2025-09-07T06:13:37.3070064Z * [new branch] msaroufim/reduce -> origin/msaroufim/reduce 2025-09-07T06:13:37.3071739Z * [new branch] mtia/basic-cmake -> origin/mtia/basic-cmake 2025-09-07T06:13:37.3072901Z * [new branch] muon_dev -> origin/muon_dev 2025-09-07T06:13:37.3074110Z * [new branch] muon_dev_1 -> origin/muon_dev_1 2025-09-07T06:13:37.3075374Z * [new branch] nativert_num_outputs -> origin/nativert_num_outputs 2025-09-07T06:13:37.3076653Z * [new branch] nativert_numoutputs -> origin/nativert_numoutputs 2025-09-07T06:13:37.3077892Z * [new branch] new-modifiy-setupvllm -> origin/new-modifiy-setupvllm 2025-09-07T06:13:37.3079449Z * [new branch] new-setupvllm -> origin/new-setupvllm 2025-09-07T06:13:37.3080649Z * [new branch] new_zeros_dtype -> origin/new_zeros_dtype 2025-09-07T06:13:37.3081958Z * [new branch] newtest-base -> origin/newtest-base 2025-09-07T06:13:37.3083518Z * [new branch] ngimel/cat_perf1 -> origin/ngimel/cat_perf1 2025-09-07T06:13:37.3084666Z * [new branch] ngimel/einsum_fix -> origin/ngimel/einsum_fix 2025-09-07T06:13:37.3085684Z * [new branch] ngimel/error_index_list -> origin/ngimel/error_index_list 2025-09-07T06:13:37.3086702Z * [new branch] ngimel/fabric_check -> origin/ngimel/fabric_check 2025-09-07T06:13:37.3087745Z * [new branch] ngimel/fabric_fix -> origin/ngimel/fabric_fix 2025-09-07T06:13:37.3088891Z * [new branch] ngimel/fix_driver_init_error -> origin/ngimel/fix_driver_init_error 2025-09-07T06:13:37.3090346Z * [new branch] ngimel/fix_nccl_segment_seg -> origin/ngimel/fix_nccl_segment_seg 2025-09-07T06:13:37.3091577Z * [new branch] ngimel/gg_new -> origin/ngimel/gg_new 2025-09-07T06:13:37.3095007Z * [new branch] ngimel/modeguard -> origin/ngimel/modeguard 2025-09-07T06:13:37.3096832Z * [new branch] ngimel/multicast_fix -> origin/ngimel/multicast_fix 2025-09-07T06:13:37.3098158Z * [new branch] ngimel/rocm_handle_type -> origin/ngimel/rocm_handle_type 2025-09-07T06:13:37.3099414Z * [new branch] ngimel/symm_handle_fabric -> origin/ngimel/symm_handle_fabric 2025-09-07T06:13:37.3100665Z * [new branch] ngimel/unbind_multimem -> origin/ngimel/unbind_multimem 2025-09-07T06:13:37.3101940Z * [new branch] nightly -> origin/nightly 2025-09-07T06:13:37.3103364Z * [new branch] nmacchioni-patch-10 -> origin/nmacchioni-patch-10 2025-09-07T06:13:37.3104796Z * [new branch] nmacchioni-patch-7 -> origin/nmacchioni-patch-7 2025-09-07T06:13:37.3106158Z * [new branch] nmacchioni-patch-8 -> origin/nmacchioni-patch-8 2025-09-07T06:13:37.3107509Z * [new branch] nmacchioni-patch-9 -> origin/nmacchioni-patch-9 2025-09-07T06:13:37.3109120Z * [new branch] nullplay/fuse_matmul -> origin/nullplay/fuse_matmul 2025-09-07T06:13:37.3110347Z * [new branch] nullplay_fuse_matmul -> origin/nullplay_fuse_matmul 2025-09-07T06:13:37.3111482Z * [new branch] one-off -> origin/one-off 2025-09-07T06:13:37.3113483Z * [new branch] orig/release/1.10 -> origin/orig/release/1.10 2025-09-07T06:13:37.3114733Z * [new branch] orig/release/1.11 -> origin/orig/release/1.11 2025-09-07T06:13:37.3115911Z * [new branch] orig/release/1.12 -> origin/orig/release/1.12 2025-09-07T06:13:37.3117336Z * [new branch] orig/release/1.13 -> origin/orig/release/1.13 2025-09-07T06:13:37.3118592Z * [new branch] orig/release/1.6 -> origin/orig/release/1.6 2025-09-07T06:13:37.3120022Z * [new branch] orig/release/1.7 -> origin/orig/release/1.7 2025-09-07T06:13:37.3121231Z * [new branch] orig/release/1.8 -> origin/orig/release/1.8 2025-09-07T06:13:37.3122459Z * [new branch] orig/release/1.9 -> origin/orig/release/1.9 2025-09-07T06:13:37.3123627Z * [new branch] orig/release/2.0 -> origin/orig/release/2.0 2025-09-07T06:13:37.3124829Z * [new branch] orig/release/2.1 -> origin/orig/release/2.1 2025-09-07T06:13:37.3126162Z * [new branch] orig/release/2.2 -> origin/orig/release/2.2 2025-09-07T06:13:37.3127298Z * [new branch] orig/release/2.3 -> origin/orig/release/2.3 2025-09-07T06:13:37.3128802Z * [new branch] orig/release/2.4 -> origin/orig/release/2.4 2025-09-07T06:13:37.3130011Z * [new branch] orig/release/2.5 -> origin/orig/release/2.5 2025-09-07T06:13:37.3131113Z * [new branch] orig/release/2.6 -> origin/orig/release/2.6 2025-09-07T06:13:37.3132565Z * [new branch] orig/release/2.7 -> origin/orig/release/2.7 2025-09-07T06:13:37.3134712Z * [new branch] orig/release/2.8 -> origin/orig/release/2.8 2025-09-07T06:13:37.3136314Z * [new branch] oulgen/fx_graph -> origin/oulgen/fx_graph 2025-09-07T06:13:37.3137573Z * [new branch] padded-tensor -> origin/padded-tensor 2025-09-07T06:13:37.3138858Z * [new branch] pca2 -> origin/pca2 2025-09-07T06:13:37.3140248Z * [new branch] pianpwk-patch-1 -> origin/pianpwk-patch-1 2025-09-07T06:13:37.3142069Z * [new branch] pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export 2025-09-07T06:13:37.3143170Z * [new branch] pianpwk/invalidate_fake_memo -> origin/pianpwk/invalidate_fake_memo 2025-09-07T06:13:37.3144174Z * [new branch] pianpwk/max_1_strides -> origin/pianpwk/max_1_strides 2025-09-07T06:13:37.3145394Z * [new branch] pianpwk/maybe_guard_rel -> origin/pianpwk/maybe_guard_rel 2025-09-07T06:13:37.3146444Z * [new branch] pianpwk/nonzero_memo -> origin/pianpwk/nonzero_memo 2025-09-07T06:13:37.3148610Z * [new branch] pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better 2025-09-07T06:13:37.3150335Z * [new branch] pianpwk/oblivious_slice_forward -> origin/pianpwk/oblivious_slice_forward 2025-09-07T06:13:37.3150789Z * [new branch] pianpwk/oblivious_where -> origin/pianpwk/oblivious_where 2025-09-07T06:13:37.3151787Z * [new branch] pianpwk/param_static_pgo -> origin/pianpwk/param_static_pgo 2025-09-07T06:13:37.3152845Z * [new branch] pianpwk/pre_forward_hook -> origin/pianpwk/pre_forward_hook 2025-09-07T06:13:37.3154161Z * [new branch] pianpwk/remove_guard_fail_break -> origin/pianpwk/remove_guard_fail_break 2025-09-07T06:13:37.3155184Z * [new branch] pianpwk/slice_fresh_symbols -> origin/pianpwk/slice_fresh_symbols 2025-09-07T06:13:37.3156285Z * [new branch] pianpwk/sym_tokens_draft -> origin/pianpwk/sym_tokens_draft 2025-09-07T06:13:37.3157573Z * [new branch] pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false 2025-09-07T06:13:37.3158604Z * [new branch] pianpwk/test_slice_fake_impl -> origin/pianpwk/test_slice_fake_impl 2025-09-07T06:13:37.3159734Z * [new branch] pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap 2025-09-07T06:13:37.3161093Z * [new branch] pianpwk/unbacked_channels_last -> origin/pianpwk/unbacked_channels_last 2025-09-07T06:13:37.3162344Z * [new branch] pianpwk/unbacked_safe_conv1d -> origin/pianpwk/unbacked_safe_conv1d 2025-09-07T06:13:37.3163484Z * [new branch] pianpwk/unbacked_sdpa_flash -> origin/pianpwk/unbacked_sdpa_flash 2025-09-07T06:13:37.3164661Z * [new branch] pianpwk/unbacked_should_swap -> origin/pianpwk/unbacked_should_swap 2025-09-07T06:13:37.3165791Z * [new branch] pianpwk/unbacked_should_swap_2 -> origin/pianpwk/unbacked_should_swap_2 2025-09-07T06:13:37.3166998Z * [new branch] pianpwk/unbacked_slice_binding -> origin/pianpwk/unbacked_slice_binding 2025-09-07T06:13:37.3168161Z * [new branch] pianpwk/unbacked_slice_forward -> origin/pianpwk/unbacked_slice_forward 2025-09-07T06:13:37.3169195Z * [new branch] pianpwk/user_symints -> origin/pianpwk/user_symints 2025-09-07T06:13:37.3170373Z * [new branch] pianpwk/wan21_reshape -> origin/pianpwk/wan21_reshape 2025-09-07T06:13:37.3171493Z * [new branch] pianpwk/whitelist_optimizer -> origin/pianpwk/whitelist_optimizer 2025-09-07T06:13:37.3172804Z * [new branch] pin-torchao -> origin/pin-torchao 2025-09-07T06:13:37.3174829Z * [new branch] piz/fall_back_missing_0716 -> origin/piz/fall_back_missing_0716 2025-09-07T06:13:37.3176025Z * [new branch] piz/improve_scatter_0808 -> origin/piz/improve_scatter_0808 2025-09-07T06:13:37.3177167Z * [new branch] pool-separate -> origin/pool-separate 2025-09-07T06:13:37.3178410Z * [new branch] pr-156087 -> origin/pr-156087 2025-09-07T06:13:37.3180175Z * [new branch] pr/131860 -> origin/pr/131860 2025-09-07T06:13:37.3181489Z * [new branch] predispatch_to -> origin/predispatch_to 2025-09-07T06:13:37.3182721Z * [new branch] pt-opt-cuda3 -> origin/pt-opt-cuda3 2025-09-07T06:13:37.3183950Z * [new branch] pyobjectslot -> origin/pyobjectslot 2025-09-07T06:13:37.3185759Z * [new branch] python_compiled_autograd -> origin/python_compiled_autograd 2025-09-07T06:13:37.3187885Z * [new branch] qchip/export-D54134695 -> origin/qchip/export-D54134695 2025-09-07T06:13:37.3189053Z * [new branch] quint-bits -> origin/quint-bits 2025-09-07T06:13:37.3190753Z * [new branch] release/1.10 -> origin/release/1.10 2025-09-07T06:13:37.3192105Z * [new branch] release/1.11 -> origin/release/1.11 2025-09-07T06:13:37.3193716Z * [new branch] release/1.12 -> origin/release/1.12 2025-09-07T06:13:37.3195159Z * [new branch] release/1.13 -> origin/release/1.13 2025-09-07T06:13:37.3196319Z * [new branch] release/1.4 -> origin/release/1.4 2025-09-07T06:13:37.3197293Z * [new branch] release/1.4.1 -> origin/release/1.4.1 2025-09-07T06:13:37.3198503Z * [new branch] release/1.5 -> origin/release/1.5 2025-09-07T06:13:37.3199771Z * [new branch] release/1.6 -> origin/release/1.6 2025-09-07T06:13:37.3201031Z * [new branch] release/1.7 -> origin/release/1.7 2025-09-07T06:13:37.3202356Z * [new branch] release/1.8 -> origin/release/1.8 2025-09-07T06:13:37.3203509Z * [new branch] release/1.9 -> origin/release/1.9 2025-09-07T06:13:37.3204996Z * [new branch] release/2.0 -> origin/release/2.0 2025-09-07T06:13:37.3206296Z * [new branch] release/2.1 -> origin/release/2.1 2025-09-07T06:13:37.3207451Z * [new branch] release/2.2 -> origin/release/2.2 2025-09-07T06:13:37.3208958Z * [new branch] release/2.3 -> origin/release/2.3 2025-09-07T06:13:37.3210463Z * [new branch] release/2.4 -> origin/release/2.4 2025-09-07T06:13:37.3212056Z * [new branch] release/2.5 -> origin/release/2.5 2025-09-07T06:13:37.3213690Z * [new branch] release/2.6 -> origin/release/2.6 2025-09-07T06:13:37.3214994Z * [new branch] release/2.7 -> origin/release/2.7 2025-09-07T06:13:37.3216268Z * [new branch] release/2.8 -> origin/release/2.8 2025-09-07T06:13:37.3217618Z * [new branch] release_notes -> origin/release_notes 2025-09-07T06:13:37.3218962Z * [new branch] remove-actionable-label -> origin/remove-actionable-label 2025-09-07T06:13:37.3220127Z * [new branch] remove-ao -> origin/remove-ao 2025-09-07T06:13:37.3222043Z * [new branch] removedeprecatedvllmtest -> origin/removedeprecatedvllmtest 2025-09-07T06:13:37.3223377Z * [new branch] replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836 2025-09-07T06:13:37.3224413Z * [new branch] replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248 2025-09-07T06:13:37.3225498Z * [new branch] replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324 2025-09-07T06:13:37.3227011Z * [new branch] replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020 2025-09-07T06:13:37.3228143Z * [new branch] replace-pytorch-labs-20250812-204125 -> origin/replace-pytorch-labs-20250812-204125 2025-09-07T06:13:37.3229311Z * [new branch] replace-pytorch-labs-20250812-205624 -> origin/replace-pytorch-labs-20250812-205624 2025-09-07T06:13:37.3231587Z * [new branch] revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head 2025-09-07T06:13:37.3233762Z * [new branch] revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head 2025-09-07T06:13:37.3235989Z * [new branch] revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head 2025-09-07T06:13:37.3237507Z * [new branch] revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ 2025-09-07T06:13:37.3238525Z * [new branch] rocm-monitoring -> origin/rocm-monitoring 2025-09-07T06:13:37.3240092Z * [new branch] ruisi/relax_memory -> origin/ruisi/relax_memory 2025-09-07T06:13:37.3241407Z * [new branch] run-torchbench-smoke-test-h100 -> origin/run-torchbench-smoke-test-h100 2025-09-07T06:13:37.3243646Z * [new branch] ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures 2025-09-07T06:13:37.3244624Z * [new branch] ryanguo99/fix-closure-var -> origin/ryanguo99/fix-closure-var 2025-09-07T06:13:37.3246693Z * [new branch] rzou/faketensor_bench -> origin/rzou/faketensor_bench 2025-09-07T06:13:37.3247720Z * [new branch] rzou/njt -> origin/rzou/njt 2025-09-07T06:13:37.3248895Z * [new branch] rzou/pca -> origin/rzou/pca 2025-09-07T06:13:37.3249997Z * [new branch] rzou/realprop -> origin/rzou/realprop 2025-09-07T06:13:37.3251040Z * [new branch] rzou/setup_context -> origin/rzou/setup_context 2025-09-07T06:13:37.3252972Z * [new branch] sanchitintel/refactor_aten_int8_woq_gemm -> origin/sanchitintel/refactor_aten_int8_woq_gemm 2025-09-07T06:13:37.3254662Z * [new branch] sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm 2025-09-07T06:13:37.3255872Z * [new branch] sapling-pr-archive-SS-JIA -> origin/sapling-pr-archive-SS-JIA 2025-09-07T06:13:37.3257026Z * [new branch] save -> origin/save 2025-09-07T06:13:37.3258619Z * [new branch] sdym/2.5.1 -> origin/sdym/2.5.1 2025-09-07T06:13:37.3260026Z * [new branch] seemethere-patch-1 -> origin/seemethere-patch-1 2025-09-07T06:13:37.3261208Z * [new branch] setupvllm -> origin/setupvllm 2025-09-07T06:13:37.3262474Z * [new branch] share_and_pin_fork -> origin/share_and_pin_fork 2025-09-07T06:13:37.3264074Z * [new branch] shengf/fx-xform-perf -> origin/shengf/fx-xform-perf 2025-09-07T06:13:37.3265483Z * [new branch] shikaili_fp8_allgather -> origin/shikaili_fp8_allgather 2025-09-07T06:13:37.3266889Z * [new branch] shoumikhin-patch-1 -> origin/shoumikhin-patch-1 2025-09-07T06:13:37.3268098Z * [new branch] shoumikhin-patch-12 -> origin/shoumikhin-patch-12 2025-09-07T06:13:37.3269397Z * [new branch] simplify-fq-per-channel -> origin/simplify-fq-per-channel 2025-09-07T06:13:37.3270561Z * [new branch] solve-accuracy-fix -> origin/solve-accuracy-fix 2025-09-07T06:13:37.3272068Z * [new branch] soulitzer/stash-tls-ac -> origin/soulitzer/stash-tls-ac 2025-09-07T06:13:37.3273900Z * [new branch] sqzhang/flight4 -> origin/sqzhang/flight4 2025-09-07T06:13:37.3275471Z * [new branch] sqzhang/flight4plus -> origin/sqzhang/flight4plus 2025-09-07T06:13:37.3277012Z * [new branch] sraikund/record_funct_test -> origin/sraikund/record_funct_test 2025-09-07T06:13:37.3278546Z * [new branch] sraikund16/test -> origin/sraikund16/test 2025-09-07T06:13:37.3279871Z * [new branch] stablize-compilation-time -> origin/stablize-compilation-time 2025-09-07T06:13:37.3281113Z * [new branch] standalone-templates -> origin/standalone-templates 2025-09-07T06:13:37.3282349Z * [new branch] standalone_package_weights -> origin/standalone_package_weights 2025-09-07T06:13:37.3283520Z * [new branch] starterTaskUpdate -> origin/starterTaskUpdate 2025-09-07T06:13:37.3284742Z * [new branch] subgraph_fuse -> origin/subgraph_fuse 2025-09-07T06:13:37.3286142Z * [new branch] support-uv-in-collect_env -> origin/support-uv-in-collect_env 2025-09-07T06:13:37.3287213Z * [new branch] sve-poc -> origin/sve-poc 2025-09-07T06:13:37.3288445Z * [new branch] svekars-patch-1 -> origin/svekars-patch-1 2025-09-07T06:13:37.3289671Z * [new branch] switch-bn -> origin/switch-bn 2025-09-07T06:13:37.3291093Z * [new branch] sympy-bottleneck-repro -> origin/sympy-bottleneck-repro 2025-09-07T06:13:37.3293130Z * [new branch] tenpercent/ck_rocm_ci_v3 -> origin/tenpercent/ck_rocm_ci_v3 2025-09-07T06:13:37.3294486Z * [new branch] tensordict_integration -> origin/tensordict_integration 2025-09-07T06:13:37.3295658Z * [new branch] test-7054 -> origin/test-7054 2025-09-07T06:13:37.3297110Z * [new branch] test-move-conda-builds -> origin/test-move-conda-builds 2025-09-07T06:13:37.3298443Z * [new branch] test-myst-markdown-docstring -> origin/test-myst-markdown-docstring 2025-09-07T06:13:37.3299590Z * [new branch] test-old -> origin/test-old 2025-09-07T06:13:37.3300905Z * [new branch] test-vec-migration-internally -> origin/test-vec-migration-internally 2025-09-07T06:13:37.3302413Z * [new branch] test/bmm_heur -> origin/test/bmm_heur 2025-09-07T06:13:37.3303579Z * [new branch] test/inductor -> origin/test/inductor 2025-09-07T06:13:37.3305340Z * [new branch] tianren/flex_paged_attn_fix -> origin/tianren/flex_paged_attn_fix 2025-09-07T06:13:37.3306468Z * [new branch] tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp 2025-09-07T06:13:37.3307480Z * [new branch] tianren/test -> origin/tianren/test 2025-09-07T06:13:37.3308857Z * [new branch] tidy_performance_cyy -> origin/tidy_performance_cyy 2025-09-07T06:13:37.3310052Z * [new branch] torchtitan_ep -> origin/torchtitan_ep 2025-09-07T06:13:37.3311297Z * [new branch] trace_fsdp_torchtune_lora -> origin/trace_fsdp_torchtune_lora 2025-09-07T06:13:37.3312457Z * [new branch] traceable_fsdp_unit_tests -> origin/traceable_fsdp_unit_tests 2025-09-07T06:13:37.3314082Z * [new branch] tree_loop_vec_base -> origin/tree_loop_vec_base 2025-09-07T06:13:37.3315371Z * [new branch] tree_vec_base -> origin/tree_vec_base 2025-09-07T06:13:37.3316583Z * [new branch] triton-update -> origin/triton-update 2025-09-07T06:13:37.3317781Z * [new branch] triton_kernel -> origin/triton_kernel 2025-09-07T06:13:37.3318901Z * [new branch] triton_kernel_perf -> origin/triton_kernel_perf 2025-09-07T06:13:37.3320056Z * [new branch] tt_pkg_1908 -> origin/tt_pkg_1908 2025-09-07T06:13:37.3321585Z * [new branch] tweak-transformer-dependabot -> origin/tweak-transformer-dependabot 2025-09-07T06:13:37.3322564Z * [new branch] type_dec -> origin/type_dec 2025-09-07T06:13:37.3323922Z * [new branch] udate-sphinx-dependancies -> origin/udate-sphinx-dependancies 2025-09-07T06:13:37.3325666Z * [new branch] update-audio-commit-hash/16818882925-1712-1 -> origin/update-audio-commit-hash/16818882925-1712-1 2025-09-07T06:13:37.3326711Z * [new branch] update-audio-commit-hash/16895560422-1720-1 -> origin/update-audio-commit-hash/16895560422-1720-1 2025-09-07T06:13:37.3327921Z * [new branch] update-audio-commit-hash/16924174496-1738-1 -> origin/update-audio-commit-hash/16924174496-1738-1 2025-09-07T06:13:37.3329047Z * [new branch] update-audio-commit-hash/17002010821-1749-1 -> origin/update-audio-commit-hash/17002010821-1749-1 2025-09-07T06:13:37.3330185Z * [new branch] update-audio-commit-hash/17056004427-1766-1 -> origin/update-audio-commit-hash/17056004427-1766-1 2025-09-07T06:13:37.3331525Z * [new branch] update-audio-commit-hash/17085054029-1767-1 -> origin/update-audio-commit-hash/17085054029-1767-1 2025-09-07T06:13:37.3332960Z * [new branch] update-audio-commit-hash/17142507405-1771-1 -> origin/update-audio-commit-hash/17142507405-1771-1 2025-09-07T06:13:37.3334817Z * [new branch] update-audio-commit-hash/17168762740-1773-1 -> origin/update-audio-commit-hash/17168762740-1773-1 2025-09-07T06:13:37.3336012Z * [new branch] update-audio-commit-hash/17311174639-1780-1 -> origin/update-audio-commit-hash/17311174639-1780-1 2025-09-07T06:13:37.3337203Z * [new branch] update-audio-commit-hash/17336898740-1781-1 -> origin/update-audio-commit-hash/17336898740-1781-1 2025-09-07T06:13:37.3338342Z * [new branch] update-audio-commit-hash/17389727684-1786-1 -> origin/update-audio-commit-hash/17389727684-1786-1 2025-09-07T06:13:37.3339526Z * [new branch] update-audio-commit-hash/17449538142-1790-1 -> origin/update-audio-commit-hash/17449538142-1790-1 2025-09-07T06:13:37.3340696Z * [new branch] update-audio-commit-hash/17507351808-1794-1 -> origin/update-audio-commit-hash/17507351808-1794-1 2025-09-07T06:13:37.3341816Z * [new branch] update-dynamic-shapes-doc -> origin/update-dynamic-shapes-doc 2025-09-07T06:13:37.3343630Z * [new branch] update-executorch-commit-hash/15694981040-1626-1 -> origin/update-executorch-commit-hash/15694981040-1626-1 2025-09-07T06:13:37.3345231Z * [new branch] update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2 2025-09-07T06:13:37.3346855Z * [new branch] update-vision-commit-hash/15336342773-1607-1 -> origin/update-vision-commit-hash/15336342773-1607-1 2025-09-07T06:13:37.3348476Z * [new branch] update-vllm-commit-hash/16737365217-1704-1 -> origin/update-vllm-commit-hash/16737365217-1704-1 2025-09-07T06:13:37.3349489Z * [new branch] update-vllm-commit-hash/16843157111-1713-1 -> origin/update-vllm-commit-hash/16843157111-1713-1 2025-09-07T06:13:37.3350617Z * [new branch] update-vllm-commit-hash/16855312394-1714-1 -> origin/update-vllm-commit-hash/16855312394-1714-1 2025-09-07T06:13:37.3351608Z * [new branch] update-vllm-commit-hash/16924174496-1738-1 -> origin/update-vllm-commit-hash/16924174496-1738-1 2025-09-07T06:13:37.3352780Z * [new branch] update-vllm-commit-hash/16952608705-1745-1 -> origin/update-vllm-commit-hash/16952608705-1745-1 2025-09-07T06:13:37.3354083Z * [new branch] update-vllm-commit-hash/16979836546-1748-1 -> origin/update-vllm-commit-hash/16979836546-1748-1 2025-09-07T06:13:37.3355492Z * [new branch] update-vllm-commit-hash/17014576881-1756-1 -> origin/update-vllm-commit-hash/17014576881-1756-1 2025-09-07T06:13:37.3357024Z * [new branch] update-vllm-commit-hash/17027830869-1761-1 -> origin/update-vllm-commit-hash/17027830869-1761-1 2025-09-07T06:13:37.3358124Z * [new branch] update-vllm-commit-hash/17056004427-1766-1 -> origin/update-vllm-commit-hash/17056004427-1766-1 2025-09-07T06:13:37.3359293Z * [new branch] update-vllm-commit-hash/17085054029-1767-1 -> origin/update-vllm-commit-hash/17085054029-1767-1 2025-09-07T06:13:37.3360453Z * [new branch] update-vllm-commit-hash/17113610216-1768-1 -> origin/update-vllm-commit-hash/17113610216-1768-1 2025-09-07T06:13:37.3361608Z * [new branch] update-vllm-commit-hash/17142507405-1771-1 -> origin/update-vllm-commit-hash/17142507405-1771-1 2025-09-07T06:13:37.3362748Z * [new branch] update-vllm-commit-hash/17181878974-1774-1 -> origin/update-vllm-commit-hash/17181878974-1774-1 2025-09-07T06:13:37.3363994Z * [new branch] update-vllm-commit-hash/17311174639-1780-1 -> origin/update-vllm-commit-hash/17311174639-1780-1 2025-09-07T06:13:37.3365145Z * [new branch] update-vllm-commit-hash/17336898740-1781-1 -> origin/update-vllm-commit-hash/17336898740-1781-1 2025-09-07T06:13:37.3366283Z * [new branch] update-vllm-commit-hash/17364352302-1785-1 -> origin/update-vllm-commit-hash/17364352302-1785-1 2025-09-07T06:13:37.3367343Z * [new branch] update-vllm-commit-hash/17389727684-1786-1 -> origin/update-vllm-commit-hash/17389727684-1786-1 2025-09-07T06:13:37.3368562Z * [new branch] update-vllm-commit-hash/17449538142-1790-1 -> origin/update-vllm-commit-hash/17449538142-1790-1 2025-09-07T06:13:37.3369730Z * [new branch] update-vllm-commit-hash/17480069797-1791-1 -> origin/update-vllm-commit-hash/17480069797-1791-1 2025-09-07T06:13:37.3370829Z * [new branch] update-vllm-commit-hash/17507351808-1794-1 -> origin/update-vllm-commit-hash/17507351808-1794-1 2025-09-07T06:13:37.3372419Z * [new branch] update-xla-commit-hash/16873912760-198-1 -> origin/update-xla-commit-hash/16873912760-198-1 2025-09-07T06:13:37.3373944Z * [new branch] update-xla-commit-hash/17034266655-199-1 -> origin/update-xla-commit-hash/17034266655-199-1 2025-09-07T06:13:37.3374975Z * [new branch] update-xla-commit-hash/17202464405-200-1 -> origin/update-xla-commit-hash/17202464405-200-1 2025-09-07T06:13:37.3376308Z * [new branch] update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388 2025-09-07T06:13:37.3377386Z * [new branch] update_executorch_pin -> origin/update_executorch_pin 2025-09-07T06:13:37.3378751Z * [new branch] update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736 2025-09-07T06:13:37.3380047Z * [new branch] update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173 2025-09-07T06:13:37.3381271Z * [new branch] update_slow_tests_1752478971 -> origin/update_slow_tests_1752478971 2025-09-07T06:13:37.3382619Z * [new branch] update_slow_tests_1755502951 -> origin/update_slow_tests_1755502951 2025-09-07T06:13:37.3383837Z * [new branch] update_slow_tests_1756107664 -> origin/update_slow_tests_1756107664 2025-09-07T06:13:37.3385210Z * [new branch] update_submodule_FBGEMM -> origin/update_submodule_FBGEMM 2025-09-07T06:13:37.3386420Z * [new branch] update_submodule_kineto -> origin/update_submodule_kineto 2025-09-07T06:13:37.3387659Z * [new branch] update_submodule_tensorpipe -> origin/update_submodule_tensorpipe 2025-09-07T06:13:37.3388914Z * [new branch] v0.1.2 -> origin/v0.1.2 2025-09-07T06:13:37.3390265Z * [new branch] v1.0.1 -> origin/v1.0.1 2025-09-07T06:13:37.3392173Z * [new branch] v1.0.3 -> origin/v1.0.3 2025-09-07T06:13:37.3401086Z * [new branch] v1.1.0 -> origin/v1.1.0 2025-09-07T06:13:37.3401613Z * [new branch] v1.2.0 -> origin/v1.2.0 2025-09-07T06:13:37.3401835Z * [new branch] v1.3.0 -> origin/v1.3.0 2025-09-07T06:13:37.3402032Z * [new branch] v1.3.1 -> origin/v1.3.1 2025-09-07T06:13:37.3402240Z * [new branch] validate_fn -> origin/validate_fn 2025-09-07T06:13:37.3402481Z * [new branch] validations_2.6 -> origin/validations_2.6 2025-09-07T06:13:37.3402711Z * [new branch] validations_2.8 -> origin/validations_2.8 2025-09-07T06:13:37.3403843Z * [new branch] viable/strict -> origin/viable/strict 2025-09-07T06:13:37.3405144Z * [new branch] vllmbuildci -> origin/vllmbuildci 2025-09-07T06:13:37.3406381Z * [new branch] vllmpin -> origin/vllmpin 2025-09-07T06:13:37.3408083Z * [new branch] wdvr/conda_devcontainer -> origin/wdvr/conda_devcontainer 2025-09-07T06:13:37.3409160Z * [new branch] wdvr/iss_145259 -> origin/wdvr/iss_145259 2025-09-07T06:13:37.3410563Z * [new branch] weight_sharing_cpp -> origin/weight_sharing_cpp 2025-09-07T06:13:37.3413398Z * [new branch] whc/flight4 -> origin/whc/flight4 2025-09-07T06:13:37.3414661Z * [new branch] whc/flight51 -> origin/whc/flight51 2025-09-07T06:13:37.3415882Z * [new branch] whc/flight53 -> origin/whc/flight53 2025-09-07T06:13:37.3417164Z * [new branch] whc/stage2 -> origin/whc/stage2 2025-09-07T06:13:37.3418276Z * [new branch] whc/uneven -> origin/whc/uneven 2025-09-07T06:13:37.3419988Z * [new branch] whc/uneven-merge -> origin/whc/uneven-merge 2025-09-07T06:13:37.3421290Z * [new branch] win_warnings -> origin/win_warnings 2025-09-07T06:13:37.3422531Z * [new branch] windows_libtorch_free -> origin/windows_libtorch_free 2025-09-07T06:13:37.3423733Z * [new branch] workonoldcommit -> origin/workonoldcommit 2025-09-07T06:13:37.3425497Z * [new branch] wychi-autotune-prune-configs-by-shared-mem -> origin/wychi-autotune-prune-configs-by-shared-mem 2025-09-07T06:13:37.3426794Z * [new branch] xmfan/ca_0516 -> origin/xmfan/ca_0516 2025-09-07T06:13:37.3427918Z * [new branch] xmfan/ca_1051b93192 -> origin/xmfan/ca_1051b93192 2025-09-07T06:13:37.3429179Z * [new branch] xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 2025-09-07T06:13:37.3429976Z * [new branch] xmfan/ca_5a2be192d1 -> origin/xmfan/ca_5a2be192d1 2025-09-07T06:13:37.3431087Z * [new branch] xmfan/ca_9d59b516e9 -> origin/xmfan/ca_9d59b516e9 2025-09-07T06:13:37.3432185Z * [new branch] xmfan/ca_api -> origin/xmfan/ca_api 2025-09-07T06:13:37.3433690Z * [new branch] xmfan/ca_apr8 -> origin/xmfan/ca_apr8 2025-09-07T06:13:37.3435075Z * [new branch] xmfan/ca_base -> origin/xmfan/ca_base 2025-09-07T06:13:37.3436642Z * [new branch] xmfan/ca_cudagraphs -> origin/xmfan/ca_cudagraphs 2025-09-07T06:13:37.3437738Z * [new branch] xmfan/ca_dynamic -> origin/xmfan/ca_dynamic 2025-09-07T06:13:37.3438865Z * [new branch] xmfan/ca_fix_dyn -> origin/xmfan/ca_fix_dyn 2025-09-07T06:13:37.3439992Z * [new branch] xmfan/ca_fix_lowering -> origin/xmfan/ca_fix_lowering 2025-09-07T06:13:37.3441107Z * [new branch] xmfan/ca_fix_polyfills -> origin/xmfan/ca_fix_polyfills 2025-09-07T06:13:37.3442187Z * [new branch] xmfan/ca_jan3 -> origin/xmfan/ca_jan3 2025-09-07T06:13:37.3443463Z * [new branch] xmfan/ca_jun18 -> origin/xmfan/ca_jun18 2025-09-07T06:13:37.3444514Z * [new branch] xmfan/ca_jun24 -> origin/xmfan/ca_jun24 2025-09-07T06:13:37.3445635Z * [new branch] xmfan/ca_mem_base -> origin/xmfan/ca_mem_base 2025-09-07T06:13:37.3446713Z * [new branch] xmfan/ca_mem_fix -> origin/xmfan/ca_mem_fix 2025-09-07T06:13:37.3447839Z * [new branch] xmfan/ca_memory_fix -> origin/xmfan/ca_memory_fix 2025-09-07T06:13:37.3449491Z * [new branch] xmfan/ca_memory_fix_rebased -> origin/xmfan/ca_memory_fix_rebased 2025-09-07T06:13:37.3450705Z * [new branch] xmfan/ca_memory_fix_rebased2 -> origin/xmfan/ca_memory_fix_rebased2 2025-09-07T06:13:37.3451834Z * [new branch] xmfan/ca_move_to_cuda -> origin/xmfan/ca_move_to_cuda 2025-09-07T06:13:37.3453185Z * [new branch] xmfan/ca_nested -> origin/xmfan/ca_nested 2025-09-07T06:13:37.3454513Z * [new branch] xmfan/ca_overhead -> origin/xmfan/ca_overhead 2025-09-07T06:13:37.3455748Z * [new branch] xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451 2025-09-07T06:13:37.3456822Z * [new branch] xmfan/ca_scalar -> origin/xmfan/ca_scalar 2025-09-07T06:13:37.3458053Z * [new branch] xmfan/ca_subclass_mem_fix -> origin/xmfan/ca_subclass_mem_fix 2025-09-07T06:13:37.3459157Z * [new branch] xmfan/ca_warm_mem -> origin/xmfan/ca_warm_mem 2025-09-07T06:13:37.3460338Z * [new branch] xmfan/ca_warm_mem_base -> origin/xmfan/ca_warm_mem_base 2025-09-07T06:13:37.3461544Z * [new branch] xmfan/cacu_jun18 -> origin/xmfan/cacu_jun18 2025-09-07T06:13:37.3462736Z * [new branch] xmfan/cacu_jun19 -> origin/xmfan/cacu_jun19 2025-09-07T06:13:37.3463842Z * [new branch] xmfan/cacu_jun4 -> origin/xmfan/cacu_jun4 2025-09-07T06:13:37.3465422Z * [new branch] xmfan/cacu_may27 -> origin/xmfan/cacu_may27 2025-09-07T06:13:37.3466606Z * [new branch] xmfan/disable_duck_shape -> origin/xmfan/disable_duck_shape 2025-09-07T06:13:37.3467822Z * [new branch] xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough 2025-09-07T06:13:37.3468853Z * [new branch] xmfan/issue_123374 -> origin/xmfan/issue_123374 2025-09-07T06:13:37.3470235Z * [new branch] xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 2025-09-07T06:13:37.3471394Z * [new branch] xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 2025-09-07T06:13:37.3472321Z * [new branch] xmfan/segfault_test -> origin/xmfan/segfault_test 2025-09-07T06:13:37.3473387Z * [new branch] xmfan/single_step -> origin/xmfan/single_step 2025-09-07T06:13:37.3474507Z * [new branch] xmfan/sth_0829 -> origin/xmfan/sth_0829 2025-09-07T06:13:37.3475670Z * [new branch] xmfan/test -> origin/xmfan/test 2025-09-07T06:13:37.3477743Z * [new branch] yguo/debug-0226-constexpr -> origin/yguo/debug-0226-constexpr 2025-09-07T06:13:37.3478781Z * [new branch] yguo/new_latest_changes -> origin/yguo/new_latest_changes 2025-09-07T06:13:37.3479980Z * [new branch] yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes 2025-09-07T06:13:37.3481148Z * [new branch] yihan_quantization -> origin/yihan_quantization 2025-09-07T06:13:37.3482761Z * [new branch] yiming/add_jit_trace_benchmark -> origin/yiming/add_jit_trace_benchmark 2025-09-07T06:13:37.3483811Z * [new branch] yiming/add_nativert_benchmark -> origin/yiming/add_nativert_benchmark 2025-09-07T06:13:37.3484843Z * [new branch] yiming/bootcamp -> origin/yiming/bootcamp 2025-09-07T06:13:37.3486286Z * [new branch] zainr/canary-test -> origin/zainr/canary-test 2025-09-07T06:13:37.3487555Z * [new branch] zainr/cleanup-gh-runners -> origin/zainr/cleanup-gh-runners 2025-09-07T06:13:37.3488584Z * [new branch] zainr/git-push-v2 -> origin/zainr/git-push-v2 2025-09-07T06:13:37.3489650Z * [new branch] zainr/pull-migration-c -> origin/zainr/pull-migration-c 2025-09-07T06:13:37.3490688Z * [new branch] zainr/test -> origin/zainr/test 2025-09-07T06:13:37.3491681Z * [new branch] zainr/test2 -> origin/zainr/test2 2025-09-07T06:13:37.3494142Z * [new branch] zainr/unstable -> origin/zainr/unstable 2025-09-07T06:13:37.3495221Z * [new branch] zainr/unstable-xla -> origin/zainr/unstable-xla 2025-09-07T06:13:37.3496658Z * [new branch] zasdfgbnm-patch-3 -> origin/zasdfgbnm-patch-3 2025-09-07T06:13:37.3497951Z * [new branch] zb2p -> origin/zb2p 2025-09-07T06:13:37.3499308Z * [new branch] zero_grad_optimization -> origin/zero_grad_optimization 2025-09-07T06:13:37.3500522Z * [new branch] zeros-and-scatter-part2 -> origin/zeros-and-scatter-part2 2025-09-07T06:13:37.3502450Z * [new branch] zhxchen17/scratch/0 -> origin/zhxchen17/scratch/0 2025-09-07T06:13:37.3504079Z * [new branch] zhxhcen17/moodycamel -> origin/zhxhcen17/moodycamel 2025-09-07T06:13:37.3505919Z * [new branch] zxiiro/main -> origin/zxiiro/main 2025-09-07T06:13:37.3507242Z * [new tag] bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug -> bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug 2025-09-07T06:13:37.3507955Z * [new tag] ci/binaries/77164 -> ci/binaries/77164 2025-09-07T06:13:37.3508967Z * [new tag] ciflow/binaries/156049 -> ciflow/binaries/156049 2025-09-07T06:13:37.3509646Z * [new tag] ciflow/binaries/156712 -> ciflow/binaries/156712 2025-09-07T06:13:37.3510459Z * [new tag] ciflow/binaries/157432 -> ciflow/binaries/157432 2025-09-07T06:13:37.3511210Z * [new tag] ciflow/binaries/157685 -> ciflow/binaries/157685 2025-09-07T06:13:37.3511909Z * [new tag] ciflow/binaries/157689 -> ciflow/binaries/157689 2025-09-07T06:13:37.3512618Z * [new tag] ciflow/binaries/158104 -> ciflow/binaries/158104 2025-09-07T06:13:37.3513562Z * [new tag] ciflow/binaries/160229 -> ciflow/binaries/160229 2025-09-07T06:13:37.3514344Z * [new tag] ciflow/binaries/160720 -> ciflow/binaries/160720 2025-09-07T06:13:37.3515065Z * [new tag] ciflow/binaries/162080 -> ciflow/binaries/162080 2025-09-07T06:13:37.3515772Z * [new tag] ciflow/binaries/162329 -> ciflow/binaries/162329 2025-09-07T06:13:37.3516751Z * [new tag] ciflow/binaries_libtorch/156049 -> ciflow/binaries_libtorch/156049 2025-09-07T06:13:37.3517408Z * [new tag] ciflow/binaries_libtorch/156711 -> ciflow/binaries_libtorch/156711 2025-09-07T06:13:37.3518105Z * [new tag] ciflow/binaries_libtorch/157432 -> ciflow/binaries_libtorch/157432 2025-09-07T06:13:37.3518901Z * [new tag] ciflow/binaries_wheel/156049 -> ciflow/binaries_wheel/156049 2025-09-07T06:13:37.3519592Z * [new tag] ciflow/binaries_wheel/156711 -> ciflow/binaries_wheel/156711 2025-09-07T06:13:37.3520282Z * [new tag] ciflow/binaries_wheel/157432 -> ciflow/binaries_wheel/157432 2025-09-07T06:13:37.3520988Z * [new tag] ciflow/binaries_wheel/162136 -> ciflow/binaries_wheel/162136 2025-09-07T06:13:37.3521825Z * [new tag] ciflow/binaries_wheel/162252 -> ciflow/binaries_wheel/162252 2025-09-07T06:13:37.3522460Z * [new tag] ciflow/binaries_wheel/162325 -> ciflow/binaries_wheel/162325 2025-09-07T06:13:37.3523508Z * [new tag] ciflow/h100-distributed/156703 -> ciflow/h100-distributed/156703 2025-09-07T06:13:37.3524257Z * [new tag] ciflow/h100-symm-mem/157635 -> ciflow/h100-symm-mem/157635 2025-09-07T06:13:37.3524967Z * [new tag] ciflow/h100-symm-mem/161984 -> ciflow/h100-symm-mem/161984 2025-09-07T06:13:37.3525656Z * [new tag] ciflow/h100-symm-mem/162003 -> ciflow/h100-symm-mem/162003 2025-09-07T06:13:37.3526361Z * [new tag] ciflow/h100-symm-mem/162011 -> ciflow/h100-symm-mem/162011 2025-09-07T06:13:37.3527081Z * [new tag] ciflow/h100-symm-mem/162026 -> ciflow/h100-symm-mem/162026 2025-09-07T06:13:37.3527787Z * [new tag] ciflow/h100-symm-mem/162033 -> ciflow/h100-symm-mem/162033 2025-09-07T06:13:37.3528479Z * [new tag] ciflow/h100-symm-mem/162040 -> ciflow/h100-symm-mem/162040 2025-09-07T06:13:37.3529187Z * [new tag] ciflow/h100-symm-mem/162041 -> ciflow/h100-symm-mem/162041 2025-09-07T06:13:37.3529881Z * [new tag] ciflow/h100-symm-mem/162142 -> ciflow/h100-symm-mem/162142 2025-09-07T06:13:37.3530572Z * [new tag] ciflow/h100-symm-mem/162150 -> ciflow/h100-symm-mem/162150 2025-09-07T06:13:37.3531297Z * [new tag] ciflow/h100-symm-mem/162243 -> ciflow/h100-symm-mem/162243 2025-09-07T06:13:37.3532366Z * [new tag] ciflow/h100-symm-mem/162320 -> ciflow/h100-symm-mem/162320 2025-09-07T06:13:37.3533555Z * [new tag] ciflow/h100/159158 -> ciflow/h100/159158 2025-09-07T06:13:37.3534918Z * [new tag] ciflow/h100/160480 -> ciflow/h100/160480 2025-09-07T06:13:37.3535872Z * [new tag] ciflow/h100/161749 -> ciflow/h100/161749 2025-09-07T06:13:37.3536690Z * [new tag] ciflow/h100/162022 -> ciflow/h100/162022 2025-09-07T06:13:37.3537613Z * [new tag] ciflow/h100/162278 -> ciflow/h100/162278 2025-09-07T06:13:37.3538943Z * [new tag] ciflow/inductor-perf-test-nightly-rocm/156592 -> ciflow/inductor-perf-test-nightly-rocm/156592 2025-09-07T06:13:37.3539936Z * [new tag] ciflow/inductor-perf-test-nightly/156592 -> ciflow/inductor-perf-test-nightly/156592 2025-09-07T06:13:37.3540801Z * [new tag] ciflow/inductor-periodic/162063 -> ciflow/inductor-periodic/162063 2025-09-07T06:13:37.3541549Z * [new tag] ciflow/inductor-periodic/162227 -> ciflow/inductor-periodic/162227 2025-09-07T06:13:37.3542421Z * [new tag] ciflow/inductor-periodic/162323 -> ciflow/inductor-periodic/162323 2025-09-07T06:13:37.3543539Z * [new tag] ciflow/inductor-rocm/154170 -> ciflow/inductor-rocm/154170 2025-09-07T06:13:37.3544508Z * [new tag] ciflow/inductor-rocm/159146 -> ciflow/inductor-rocm/159146 2025-09-07T06:13:37.3545254Z * [new tag] ciflow/inductor-rocm/159158 -> ciflow/inductor-rocm/159158 2025-09-07T06:13:37.3546171Z * [new tag] ciflow/inductor-rocm/161715 -> ciflow/inductor-rocm/161715 2025-09-07T06:13:37.3547186Z * [new tag] ciflow/inductor-rocm/162053 -> ciflow/inductor-rocm/162053 2025-09-07T06:13:37.3547970Z * [new tag] ciflow/inductor-rocm/162056 -> ciflow/inductor-rocm/162056 2025-09-07T06:13:37.3548880Z * [new tag] ciflow/inductor/137400 -> ciflow/inductor/137400 2025-09-07T06:13:37.3549554Z * [new tag] ciflow/inductor/148180 -> ciflow/inductor/148180 2025-09-07T06:13:37.3550284Z * [new tag] ciflow/inductor/148328 -> ciflow/inductor/148328 2025-09-07T06:13:37.3550954Z * [new tag] ciflow/inductor/148484 -> ciflow/inductor/148484 2025-09-07T06:13:37.3551731Z * [new tag] ciflow/inductor/148492 -> ciflow/inductor/148492 2025-09-07T06:13:37.3552374Z * [new tag] ciflow/inductor/152624 -> ciflow/inductor/152624 2025-09-07T06:13:37.3553076Z * [new tag] ciflow/inductor/154694 -> ciflow/inductor/154694 2025-09-07T06:13:37.3553756Z * [new tag] ciflow/inductor/156049 -> ciflow/inductor/156049 2025-09-07T06:13:37.3554428Z * [new tag] ciflow/inductor/156592 -> ciflow/inductor/156592 2025-09-07T06:13:37.3555130Z * [new tag] ciflow/inductor/157635 -> ciflow/inductor/157635 2025-09-07T06:13:37.3555858Z * [new tag] ciflow/inductor/157685 -> ciflow/inductor/157685 2025-09-07T06:13:37.3556881Z * [new tag] ciflow/inductor/157686 -> ciflow/inductor/157686 2025-09-07T06:13:37.3557900Z * [new tag] ciflow/inductor/157689 -> ciflow/inductor/157689 2025-09-07T06:13:37.3558928Z * [new tag] ciflow/inductor/157699 -> ciflow/inductor/157699 2025-09-07T06:13:37.3559861Z * [new tag] ciflow/inductor/157743 -> ciflow/inductor/157743 2025-09-07T06:13:37.3560673Z * [new tag] ciflow/inductor/157994 -> ciflow/inductor/157994 2025-09-07T06:13:37.3561422Z * [new tag] ciflow/inductor/158091 -> ciflow/inductor/158091 2025-09-07T06:13:37.3562164Z * [new tag] ciflow/inductor/158104 -> ciflow/inductor/158104 2025-09-07T06:13:37.3563068Z * [new tag] ciflow/inductor/158404 -> ciflow/inductor/158404 2025-09-07T06:13:37.3563776Z * [new tag] ciflow/inductor/158647 -> ciflow/inductor/158647 2025-09-07T06:13:37.3564716Z * [new tag] ciflow/inductor/158932 -> ciflow/inductor/158932 2025-09-07T06:13:37.3565413Z * [new tag] ciflow/inductor/159146 -> ciflow/inductor/159146 2025-09-07T06:13:37.3566136Z * [new tag] ciflow/inductor/159158 -> ciflow/inductor/159158 2025-09-07T06:13:37.3567101Z * [new tag] ciflow/inductor/159274 -> ciflow/inductor/159274 2025-09-07T06:13:37.3567828Z * [new tag] ciflow/inductor/159664 -> ciflow/inductor/159664 2025-09-07T06:13:37.3568745Z * [new tag] ciflow/inductor/159778 -> ciflow/inductor/159778 2025-09-07T06:13:37.3569446Z * [new tag] ciflow/inductor/159835 -> ciflow/inductor/159835 2025-09-07T06:13:37.3570415Z * [new tag] ciflow/inductor/159944 -> ciflow/inductor/159944 2025-09-07T06:13:37.3571296Z * [new tag] ciflow/inductor/160161 -> ciflow/inductor/160161 2025-09-07T06:13:37.3572006Z * [new tag] ciflow/inductor/160174 -> ciflow/inductor/160174 2025-09-07T06:13:37.3573207Z * [new tag] ciflow/inductor/160323 -> ciflow/inductor/160323 2025-09-07T06:13:37.3574434Z * [new tag] ciflow/inductor/160324 -> ciflow/inductor/160324 2025-09-07T06:13:37.3575390Z * [new tag] ciflow/inductor/160325 -> ciflow/inductor/160325 2025-09-07T06:13:37.3576413Z * [new tag] ciflow/inductor/160326 -> ciflow/inductor/160326 2025-09-07T06:13:37.3577187Z * [new tag] ciflow/inductor/160327 -> ciflow/inductor/160327 2025-09-07T06:13:37.3578121Z * [new tag] ciflow/inductor/160328 -> ciflow/inductor/160328 2025-09-07T06:13:37.3579058Z * [new tag] ciflow/inductor/160329 -> ciflow/inductor/160329 2025-09-07T06:13:37.3579783Z * [new tag] ciflow/inductor/160480 -> ciflow/inductor/160480 2025-09-07T06:13:37.3580842Z * [new tag] ciflow/inductor/160532 -> ciflow/inductor/160532 2025-09-07T06:13:37.3582345Z * [new tag] ciflow/inductor/160539 -> ciflow/inductor/160539 2025-09-07T06:13:37.3583158Z * [new tag] ciflow/inductor/160580 -> ciflow/inductor/160580 2025-09-07T06:13:37.3583878Z * [new tag] ciflow/inductor/160685 -> ciflow/inductor/160685 2025-09-07T06:13:37.3584646Z * [new tag] ciflow/inductor/160686 -> ciflow/inductor/160686 2025-09-07T06:13:37.3585627Z * [new tag] ciflow/inductor/160687 -> ciflow/inductor/160687 2025-09-07T06:13:37.3586396Z * [new tag] ciflow/inductor/160688 -> ciflow/inductor/160688 2025-09-07T06:13:37.3587314Z * [new tag] ciflow/inductor/160690 -> ciflow/inductor/160690 2025-09-07T06:13:37.3587993Z * [new tag] ciflow/inductor/160706 -> ciflow/inductor/160706 2025-09-07T06:13:37.3588734Z * [new tag] ciflow/inductor/160729 -> ciflow/inductor/160729 2025-09-07T06:13:37.3589675Z * [new tag] ciflow/inductor/160798 -> ciflow/inductor/160798 2025-09-07T06:13:37.3590529Z * [new tag] ciflow/inductor/160836 -> ciflow/inductor/160836 2025-09-07T06:13:37.3591243Z * [new tag] ciflow/inductor/160843 -> ciflow/inductor/160843 2025-09-07T06:13:37.3592924Z * [new tag] ciflow/inductor/160869 -> ciflow/inductor/160869 2025-09-07T06:13:37.3593755Z * [new tag] ciflow/inductor/160920 -> ciflow/inductor/160920 2025-09-07T06:13:37.3595078Z * [new tag] ciflow/inductor/160943 -> ciflow/inductor/160943 2025-09-07T06:13:37.3595807Z * [new tag] ciflow/inductor/161092 -> ciflow/inductor/161092 2025-09-07T06:13:37.3596596Z * [new tag] ciflow/inductor/161093 -> ciflow/inductor/161093 2025-09-07T06:13:37.3597569Z * [new tag] ciflow/inductor/161109 -> ciflow/inductor/161109 2025-09-07T06:13:37.3598325Z * [new tag] ciflow/inductor/161118 -> ciflow/inductor/161118 2025-09-07T06:13:37.3599333Z * [new tag] ciflow/inductor/161178 -> ciflow/inductor/161178 2025-09-07T06:13:37.3600121Z * [new tag] ciflow/inductor/161246 -> ciflow/inductor/161246 2025-09-07T06:13:37.3600907Z * [new tag] ciflow/inductor/161349 -> ciflow/inductor/161349 2025-09-07T06:13:37.3601714Z * [new tag] ciflow/inductor/161350 -> ciflow/inductor/161350 2025-09-07T06:13:37.3602463Z * [new tag] ciflow/inductor/161351 -> ciflow/inductor/161351 2025-09-07T06:13:37.3603490Z * [new tag] ciflow/inductor/161397 -> ciflow/inductor/161397 2025-09-07T06:13:37.3604356Z * [new tag] ciflow/inductor/161404 -> ciflow/inductor/161404 2025-09-07T06:13:37.3605247Z * [new tag] ciflow/inductor/161405 -> ciflow/inductor/161405 2025-09-07T06:13:37.3606130Z * [new tag] ciflow/inductor/161406 -> ciflow/inductor/161406 2025-09-07T06:13:37.3607127Z * [new tag] ciflow/inductor/161410 -> ciflow/inductor/161410 2025-09-07T06:13:37.3607829Z * [new tag] ciflow/inductor/161414 -> ciflow/inductor/161414 2025-09-07T06:13:37.3608893Z * [new tag] ciflow/inductor/161442 -> ciflow/inductor/161442 2025-09-07T06:13:37.3609574Z * [new tag] ciflow/inductor/161458 -> ciflow/inductor/161458 2025-09-07T06:13:37.3610301Z * [new tag] ciflow/inductor/161468 -> ciflow/inductor/161468 2025-09-07T06:13:37.3611077Z * [new tag] ciflow/inductor/161469 -> ciflow/inductor/161469 2025-09-07T06:13:37.3612039Z * [new tag] ciflow/inductor/161485 -> ciflow/inductor/161485 2025-09-07T06:13:37.3612782Z * [new tag] ciflow/inductor/161499 -> ciflow/inductor/161499 2025-09-07T06:13:37.3613862Z * [new tag] ciflow/inductor/161534 -> ciflow/inductor/161534 2025-09-07T06:13:37.3614610Z * [new tag] ciflow/inductor/161595 -> ciflow/inductor/161595 2025-09-07T06:13:37.3615650Z * [new tag] ciflow/inductor/161596 -> ciflow/inductor/161596 2025-09-07T06:13:37.3616918Z * [new tag] ciflow/inductor/161630 -> ciflow/inductor/161630 2025-09-07T06:13:37.3617663Z * [new tag] ciflow/inductor/161667 -> ciflow/inductor/161667 2025-09-07T06:13:37.3618441Z * [new tag] ciflow/inductor/161670 -> ciflow/inductor/161670 2025-09-07T06:13:37.3619241Z * [new tag] ciflow/inductor/161673 -> ciflow/inductor/161673 2025-09-07T06:13:37.3620008Z * [new tag] ciflow/inductor/161674 -> ciflow/inductor/161674 2025-09-07T06:13:37.3620792Z * [new tag] ciflow/inductor/161675 -> ciflow/inductor/161675 2025-09-07T06:13:37.3621603Z * [new tag] ciflow/inductor/161693 -> ciflow/inductor/161693 2025-09-07T06:13:37.3622387Z * [new tag] ciflow/inductor/161695 -> ciflow/inductor/161695 2025-09-07T06:13:37.3623167Z * [new tag] ciflow/inductor/161715 -> ciflow/inductor/161715 2025-09-07T06:13:37.3623957Z * [new tag] ciflow/inductor/161730 -> ciflow/inductor/161730 2025-09-07T06:13:37.3625041Z * [new tag] ciflow/inductor/161732 -> ciflow/inductor/161732 2025-09-07T06:13:37.3625979Z * [new tag] ciflow/inductor/161744 -> ciflow/inductor/161744 2025-09-07T06:13:37.3626695Z * [new tag] ciflow/inductor/161746 -> ciflow/inductor/161746 2025-09-07T06:13:37.3627451Z * [new tag] ciflow/inductor/161747 -> ciflow/inductor/161747 2025-09-07T06:13:37.3628225Z * [new tag] ciflow/inductor/161819 -> ciflow/inductor/161819 2025-09-07T06:13:37.3629004Z * [new tag] ciflow/inductor/161821 -> ciflow/inductor/161821 2025-09-07T06:13:37.3629768Z * [new tag] ciflow/inductor/161828 -> ciflow/inductor/161828 2025-09-07T06:13:37.3630556Z * [new tag] ciflow/inductor/161879 -> ciflow/inductor/161879 2025-09-07T06:13:37.3631293Z * [new tag] ciflow/inductor/161880 -> ciflow/inductor/161880 2025-09-07T06:13:37.3632055Z * [new tag] ciflow/inductor/161881 -> ciflow/inductor/161881 2025-09-07T06:13:37.3633061Z * [new tag] ciflow/inductor/161907 -> ciflow/inductor/161907 2025-09-07T06:13:37.3633783Z * [new tag] ciflow/inductor/161914 -> ciflow/inductor/161914 2025-09-07T06:13:37.3634724Z * [new tag] ciflow/inductor/161924 -> ciflow/inductor/161924 2025-09-07T06:13:37.3635657Z * [new tag] ciflow/inductor/161936 -> ciflow/inductor/161936 2025-09-07T06:13:37.3636355Z * [new tag] ciflow/inductor/161938 -> ciflow/inductor/161938 2025-09-07T06:13:37.3637158Z * [new tag] ciflow/inductor/161939 -> ciflow/inductor/161939 2025-09-07T06:13:37.3637933Z * [new tag] ciflow/inductor/161940 -> ciflow/inductor/161940 2025-09-07T06:13:37.3638691Z * [new tag] ciflow/inductor/161955 -> ciflow/inductor/161955 2025-09-07T06:13:37.3639469Z * [new tag] ciflow/inductor/161957 -> ciflow/inductor/161957 2025-09-07T06:13:37.3640257Z * [new tag] ciflow/inductor/161975 -> ciflow/inductor/161975 2025-09-07T06:13:37.3641005Z * [new tag] ciflow/inductor/161977 -> ciflow/inductor/161977 2025-09-07T06:13:37.3641964Z * [new tag] ciflow/inductor/161978 -> ciflow/inductor/161978 2025-09-07T06:13:37.3642677Z * [new tag] ciflow/inductor/161979 -> ciflow/inductor/161979 2025-09-07T06:13:37.3643434Z * [new tag] ciflow/inductor/161980 -> ciflow/inductor/161980 2025-09-07T06:13:37.3644311Z * [new tag] ciflow/inductor/161988 -> ciflow/inductor/161988 2025-09-07T06:13:37.3645115Z * [new tag] ciflow/inductor/161994 -> ciflow/inductor/161994 2025-09-07T06:13:37.3645785Z * [new tag] ciflow/inductor/162013 -> ciflow/inductor/162013 2025-09-07T06:13:37.3646529Z * [new tag] ciflow/inductor/162014 -> ciflow/inductor/162014 2025-09-07T06:13:37.3647265Z * [new tag] ciflow/inductor/162017 -> ciflow/inductor/162017 2025-09-07T06:13:37.3648469Z * [new tag] ciflow/inductor/162021 -> ciflow/inductor/162021 2025-09-07T06:13:37.3649187Z * [new tag] ciflow/inductor/162023 -> ciflow/inductor/162023 2025-09-07T06:13:37.3649901Z * [new tag] ciflow/inductor/162027 -> ciflow/inductor/162027 2025-09-07T06:13:37.3650692Z * [new tag] ciflow/inductor/162029 -> ciflow/inductor/162029 2025-09-07T06:13:37.3651409Z * [new tag] ciflow/inductor/162030 -> ciflow/inductor/162030 2025-09-07T06:13:37.3652144Z * [new tag] ciflow/inductor/162031 -> ciflow/inductor/162031 2025-09-07T06:13:37.3652972Z * [new tag] ciflow/inductor/162033 -> ciflow/inductor/162033 2025-09-07T06:13:37.3654334Z * [new tag] ciflow/inductor/162052 -> ciflow/inductor/162052 2025-09-07T06:13:37.3655106Z * [new tag] ciflow/inductor/162053 -> ciflow/inductor/162053 2025-09-07T06:13:37.3655904Z * [new tag] ciflow/inductor/162056 -> ciflow/inductor/162056 2025-09-07T06:13:37.3656698Z * [new tag] ciflow/inductor/162063 -> ciflow/inductor/162063 2025-09-07T06:13:37.3657513Z * [new tag] ciflow/inductor/162066 -> ciflow/inductor/162066 2025-09-07T06:13:37.3658292Z * [new tag] ciflow/inductor/162068 -> ciflow/inductor/162068 2025-09-07T06:13:37.3659336Z * [new tag] ciflow/inductor/162081 -> ciflow/inductor/162081 2025-09-07T06:13:37.3660102Z * [new tag] ciflow/inductor/162088 -> ciflow/inductor/162088 2025-09-07T06:13:37.3660932Z * [new tag] ciflow/inductor/162089 -> ciflow/inductor/162089 2025-09-07T06:13:37.3661690Z * [new tag] ciflow/inductor/162094 -> ciflow/inductor/162094 2025-09-07T06:13:37.3662613Z * [new tag] ciflow/inductor/162098 -> ciflow/inductor/162098 2025-09-07T06:13:37.3663468Z * [new tag] ciflow/inductor/162101 -> ciflow/inductor/162101 2025-09-07T06:13:37.3664279Z * [new tag] ciflow/inductor/162102 -> ciflow/inductor/162102 2025-09-07T06:13:37.3665057Z * [new tag] ciflow/inductor/162104 -> ciflow/inductor/162104 2025-09-07T06:13:37.3665953Z * [new tag] ciflow/inductor/162106 -> ciflow/inductor/162106 2025-09-07T06:13:37.3666702Z * [new tag] ciflow/inductor/162108 -> ciflow/inductor/162108 2025-09-07T06:13:37.3667475Z * [new tag] ciflow/inductor/162126 -> ciflow/inductor/162126 2025-09-07T06:13:37.3668391Z * [new tag] ciflow/inductor/162149 -> ciflow/inductor/162149 2025-09-07T06:13:37.3669092Z * [new tag] ciflow/inductor/162164 -> ciflow/inductor/162164 2025-09-07T06:13:37.3669852Z * [new tag] ciflow/inductor/162166 -> ciflow/inductor/162166 2025-09-07T06:13:37.3670567Z * [new tag] ciflow/inductor/162169 -> ciflow/inductor/162169 2025-09-07T06:13:37.3671306Z * [new tag] ciflow/inductor/162170 -> ciflow/inductor/162170 2025-09-07T06:13:37.3672070Z * [new tag] ciflow/inductor/162171 -> ciflow/inductor/162171 2025-09-07T06:13:37.3672795Z * [new tag] ciflow/inductor/162183 -> ciflow/inductor/162183 2025-09-07T06:13:37.3673536Z * [new tag] ciflow/inductor/162189 -> ciflow/inductor/162189 2025-09-07T06:13:37.3674285Z * [new tag] ciflow/inductor/162190 -> ciflow/inductor/162190 2025-09-07T06:13:37.3675079Z * [new tag] ciflow/inductor/162191 -> ciflow/inductor/162191 2025-09-07T06:13:37.3675759Z * [new tag] ciflow/inductor/162194 -> ciflow/inductor/162194 2025-09-07T06:13:37.3676772Z * [new tag] ciflow/inductor/162200 -> ciflow/inductor/162200 2025-09-07T06:13:37.3677465Z * [new tag] ciflow/inductor/162201 -> ciflow/inductor/162201 2025-09-07T06:13:37.3678208Z * [new tag] ciflow/inductor/162208 -> ciflow/inductor/162208 2025-09-07T06:13:37.3679185Z * [new tag] ciflow/inductor/162211 -> ciflow/inductor/162211 2025-09-07T06:13:37.3679900Z * [new tag] ciflow/inductor/162216 -> ciflow/inductor/162216 2025-09-07T06:13:37.3680660Z * [new tag] ciflow/inductor/162220 -> ciflow/inductor/162220 2025-09-07T06:13:37.3681588Z * [new tag] ciflow/inductor/162222 -> ciflow/inductor/162222 2025-09-07T06:13:37.3682293Z * [new tag] ciflow/inductor/162227 -> ciflow/inductor/162227 2025-09-07T06:13:37.3683049Z * [new tag] ciflow/inductor/162238 -> ciflow/inductor/162238 2025-09-07T06:13:37.3683763Z * [new tag] ciflow/inductor/162239 -> ciflow/inductor/162239 2025-09-07T06:13:37.3684519Z * [new tag] ciflow/inductor/162240 -> ciflow/inductor/162240 2025-09-07T06:13:37.3685299Z * [new tag] ciflow/inductor/162244 -> ciflow/inductor/162244 2025-09-07T06:13:37.3686048Z * [new tag] ciflow/inductor/162245 -> ciflow/inductor/162245 2025-09-07T06:13:37.3686788Z * [new tag] ciflow/inductor/162262 -> ciflow/inductor/162262 2025-09-07T06:13:37.3687565Z * [new tag] ciflow/inductor/162275 -> ciflow/inductor/162275 2025-09-07T06:13:37.3688296Z * [new tag] ciflow/inductor/162278 -> ciflow/inductor/162278 2025-09-07T06:13:37.3689044Z * [new tag] ciflow/inductor/162284 -> ciflow/inductor/162284 2025-09-07T06:13:37.3689802Z * [new tag] ciflow/inductor/162286 -> ciflow/inductor/162286 2025-09-07T06:13:37.3690531Z * [new tag] ciflow/inductor/162288 -> ciflow/inductor/162288 2025-09-07T06:13:37.3691287Z * [new tag] ciflow/inductor/162293 -> ciflow/inductor/162293 2025-09-07T06:13:37.3692136Z * [new tag] ciflow/inductor/162294 -> ciflow/inductor/162294 2025-09-07T06:13:37.3693698Z * [new tag] ciflow/inductor/162295 -> ciflow/inductor/162295 2025-09-07T06:13:37.3694483Z * [new tag] ciflow/inductor/162296 -> ciflow/inductor/162296 2025-09-07T06:13:37.3695304Z * [new tag] ciflow/inductor/162298 -> ciflow/inductor/162298 2025-09-07T06:13:37.3696085Z * [new tag] ciflow/inductor/162307 -> ciflow/inductor/162307 2025-09-07T06:13:37.3696915Z * [new tag] ciflow/inductor/162309 -> ciflow/inductor/162309 2025-09-07T06:13:37.3697686Z * [new tag] ciflow/inductor/162311 -> ciflow/inductor/162311 2025-09-07T06:13:37.3698621Z * [new tag] ciflow/inductor/162312 -> ciflow/inductor/162312 2025-09-07T06:13:37.3699353Z * [new tag] ciflow/inductor/162315 -> ciflow/inductor/162315 2025-09-07T06:13:37.3700577Z * [new tag] ciflow/inductor/162316 -> ciflow/inductor/162316 2025-09-07T06:13:37.3701331Z * [new tag] ciflow/inductor/162318 -> ciflow/inductor/162318 2025-09-07T06:13:37.3702160Z * [new tag] ciflow/inductor/162323 -> ciflow/inductor/162323 2025-09-07T06:13:37.3702933Z * [new tag] ciflow/inductor/162341 -> ciflow/inductor/162341 2025-09-07T06:13:37.3703858Z * [new tag] ciflow/inductor/162345 -> ciflow/inductor/162345 2025-09-07T06:13:37.3705020Z * [new tag] ciflow/inductor/3b9a386 -> ciflow/inductor/3b9a386 2025-09-07T06:13:37.3706203Z * [new tag] ciflow/inductor/3d4b92b -> ciflow/inductor/3d4b92b 2025-09-07T06:13:37.3707203Z * [new tag] ciflow/inductor/d224ac7 -> ciflow/inductor/d224ac7 2025-09-07T06:13:37.3708118Z * [new tag] ciflow/linux-aarch64/157994 -> ciflow/linux-aarch64/157994 2025-09-07T06:13:37.3708784Z * [new tag] ciflow/linux-aarch64/159737 -> ciflow/linux-aarch64/159737 2025-09-07T06:13:37.3709499Z * [new tag] ciflow/linux-aarch64/160078 -> ciflow/linux-aarch64/160078 2025-09-07T06:13:37.3710373Z * [new tag] ciflow/mps/157553 -> ciflow/mps/157553 2025-09-07T06:13:37.3711040Z * [new tag] ciflow/mps/157635 -> ciflow/mps/157635 2025-09-07T06:13:37.3711746Z * [new tag] ciflow/mps/161988 -> ciflow/mps/161988 2025-09-07T06:13:37.3712475Z * [new tag] ciflow/mps/162108 -> ciflow/mps/162108 2025-09-07T06:13:37.3713195Z * [new tag] ciflow/mps/162153 -> ciflow/mps/162153 2025-09-07T06:13:37.3713921Z * [new tag] ciflow/mps/162281 -> ciflow/mps/162281 2025-09-07T06:13:37.3714794Z * [new tag] ciflow/nightly/156049 -> ciflow/nightly/156049 2025-09-07T06:13:37.3715502Z * [new tag] ciflow/nightly/158104 -> ciflow/nightly/158104 2025-09-07T06:13:37.3716471Z * [new tag] ciflow/op-benchmark/157994 -> ciflow/op-benchmark/157994 2025-09-07T06:13:37.3717625Z * [new tag] ciflow/periodic-rocm-mi300/161529 -> ciflow/periodic-rocm-mi300/161529 2025-09-07T06:13:37.3718315Z * [new tag] ciflow/periodic-rocm-mi300/161715 -> ciflow/periodic-rocm-mi300/161715 2025-09-07T06:13:37.3719348Z * [new tag] ciflow/periodic/054a2fd -> ciflow/periodic/054a2fd 2025-09-07T06:13:37.3720021Z * [new tag] ciflow/periodic/156703 -> ciflow/periodic/156703 2025-09-07T06:13:37.3720722Z * [new tag] ciflow/periodic/161715 -> ciflow/periodic/161715 2025-09-07T06:13:37.3721401Z * [new tag] ciflow/periodic/162021 -> ciflow/periodic/162021 2025-09-07T06:13:37.3722102Z * [new tag] ciflow/periodic/162323 -> ciflow/periodic/162323 2025-09-07T06:13:37.3723067Z * [new tag] ciflow/periodic/2a6d37d -> ciflow/periodic/2a6d37d 2025-09-07T06:13:37.3723965Z * [new tag] ciflow/periodic/317eeb8 -> ciflow/periodic/317eeb8 2025-09-07T06:13:37.3725283Z * [new tag] ciflow/periodic/3c32 -> ciflow/periodic/3c32 2025-09-07T06:13:37.3726263Z * [new tag] ciflow/periodic/3e98831 -> ciflow/periodic/3e98831 2025-09-07T06:13:37.3727243Z * [new tag] ciflow/periodic/94512-point -> ciflow/periodic/94512-point 2025-09-07T06:13:37.3728368Z * [new tag] ciflow/periodic/csl/test87519 -> ciflow/periodic/csl/test87519 2025-09-07T06:13:37.3729553Z * [new tag] ciflow/periodic/csltest88275 -> ciflow/periodic/csltest88275 2025-09-07T06:13:37.3730656Z * [new tag] ciflow/periodic/csltest88761 -> ciflow/periodic/csltest88761 2025-09-07T06:13:37.3731713Z * [new tag] ciflow/periodic/release_1.12 -> ciflow/periodic/release_1.12 2025-09-07T06:13:37.3733034Z * [new tag] ciflow/periodic/release_1.12.0 -> ciflow/periodic/release_1.12.0 2025-09-07T06:13:37.3734425Z * [new tag] ciflow/periodic/sha-ec5b83 -> ciflow/periodic/sha-ec5b83 2025-09-07T06:13:37.3735399Z * [new tag] ciflow/rocm-mi300/154170 -> ciflow/rocm-mi300/154170 2025-09-07T06:13:37.3736199Z * [new tag] ciflow/rocm-mi300/158747 -> ciflow/rocm-mi300/158747 2025-09-07T06:13:37.3736962Z * [new tag] ciflow/rocm-mi300/159146 -> ciflow/rocm-mi300/159146 2025-09-07T06:13:37.3737792Z * [new tag] ciflow/rocm-mi300/159158 -> ciflow/rocm-mi300/159158 2025-09-07T06:13:37.3738432Z * [new tag] ciflow/rocm-mi300/161715 -> ciflow/rocm-mi300/161715 2025-09-07T06:13:37.3739153Z * [new tag] ciflow/rocm-mi300/161957 -> ciflow/rocm-mi300/161957 2025-09-07T06:13:37.3739908Z * [new tag] ciflow/rocm-mi300/162053 -> ciflow/rocm-mi300/162053 2025-09-07T06:13:37.3740669Z * [new tag] ciflow/rocm-mi300/162056 -> ciflow/rocm-mi300/162056 2025-09-07T06:13:37.3741576Z * [new tag] ciflow/rocm-mi300/162112 -> ciflow/rocm-mi300/162112 2025-09-07T06:13:37.3742304Z * [new tag] ciflow/rocm-mi300/162245 -> ciflow/rocm-mi300/162245 2025-09-07T06:13:37.3743114Z * [new tag] ciflow/rocm-mi300/162278 -> ciflow/rocm-mi300/162278 2025-09-07T06:13:37.3744232Z * [new tag] ciflow/rocm-mi300/162288 -> ciflow/rocm-mi300/162288 2025-09-07T06:13:37.3745281Z * [new tag] ciflow/rocm-mi355/162053 -> ciflow/rocm-mi355/162053 2025-09-07T06:13:37.3746166Z * [new tag] ciflow/rocm-mi355/162056 -> ciflow/rocm-mi355/162056 2025-09-07T06:13:37.3746884Z * [new tag] ciflow/rocm/148492 -> ciflow/rocm/148492 2025-09-07T06:13:37.3747633Z * [new tag] ciflow/rocm/154170 -> ciflow/rocm/154170 2025-09-07T06:13:37.3748616Z * [new tag] ciflow/rocm/156491 -> ciflow/rocm/156491 2025-09-07T06:13:37.3749274Z * [new tag] ciflow/rocm/156592 -> ciflow/rocm/156592 2025-09-07T06:13:37.3750011Z * [new tag] ciflow/rocm/158747 -> ciflow/rocm/158747 2025-09-07T06:13:37.3750731Z * [new tag] ciflow/rocm/159146 -> ciflow/rocm/159146 2025-09-07T06:13:37.3751735Z * [new tag] ciflow/rocm/159158 -> ciflow/rocm/159158 2025-09-07T06:13:37.3752431Z * [new tag] ciflow/rocm/161715 -> ciflow/rocm/161715 2025-09-07T06:13:37.3753240Z * [new tag] ciflow/rocm/161972 -> ciflow/rocm/161972 2025-09-07T06:13:37.3753983Z * [new tag] ciflow/rocm/162052 -> ciflow/rocm/162052 2025-09-07T06:13:37.3754747Z * [new tag] ciflow/rocm/162053 -> ciflow/rocm/162053 2025-09-07T06:13:37.3755738Z * [new tag] ciflow/rocm/162056 -> ciflow/rocm/162056 2025-09-07T06:13:37.3756727Z * [new tag] ciflow/rocm/162112 -> ciflow/rocm/162112 2025-09-07T06:13:37.3757678Z * [new tag] ciflow/rocm/162278 -> ciflow/rocm/162278 2025-09-07T06:13:37.3758359Z * [new tag] ciflow/rocm/162288 -> ciflow/rocm/162288 2025-09-07T06:13:37.3759123Z * [new tag] ciflow/rocm/162305 -> ciflow/rocm/162305 2025-09-07T06:13:37.3760234Z * [new tag] ciflow/slow/01c7106 -> ciflow/slow/01c7106 2025-09-07T06:13:37.3761249Z * [new tag] ciflow/slow/0577043 -> ciflow/slow/0577043 2025-09-07T06:13:37.3762601Z * [new tag] ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym -> ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym 2025-09-07T06:13:37.3763066Z * [new tag] ciflow/slow/0e81104 -> ciflow/slow/0e81104 2025-09-07T06:13:37.3763827Z * [new tag] ciflow/slow/161395 -> ciflow/slow/161395 2025-09-07T06:13:37.3764775Z * [new tag] ciflow/slow/1732077 -> ciflow/slow/1732077 2025-09-07T06:13:37.3765733Z * [new tag] ciflow/slow/187eb7c -> ciflow/slow/187eb7c 2025-09-07T06:13:37.3766568Z * [new tag] ciflow/slow/1faef89 -> ciflow/slow/1faef89 2025-09-07T06:13:37.3767810Z * [new tag] ciflow/slow/3920ec1 -> ciflow/slow/3920ec1 2025-09-07T06:13:37.3768999Z * [new tag] ciflow/slow/3b7c6b2 -> ciflow/slow/3b7c6b2 2025-09-07T06:13:37.3770106Z * [new tag] ciflow/slow/59a3759 -> ciflow/slow/59a3759 2025-09-07T06:13:37.3770864Z * [new tag] ciflow/slow/70ef0bb -> ciflow/slow/70ef0bb 2025-09-07T06:13:37.3771819Z * [new tag] ciflow/slow/788ff06 -> ciflow/slow/788ff06 2025-09-07T06:13:37.3773508Z * [new tag] ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym -> ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym 2025-09-07T06:13:37.3774169Z * [new tag] ciflow/slow/9d85864 -> ciflow/slow/9d85864 2025-09-07T06:13:37.3775144Z * [new tag] ciflow/slow/9ffad5b -> ciflow/slow/9ffad5b 2025-09-07T06:13:37.3776051Z * [new tag] ciflow/slow/a206e8b -> ciflow/slow/a206e8b 2025-09-07T06:13:37.3777054Z * [new tag] ciflow/slow/a837609 -> ciflow/slow/a837609 2025-09-07T06:13:37.3778048Z * [new tag] ciflow/slow/af841f3 -> ciflow/slow/af841f3 2025-09-07T06:13:37.3779523Z * [new tag] ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym -> ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym 2025-09-07T06:13:37.3780116Z * [new tag] ciflow/triton_binaries/162329 -> ciflow/triton_binaries/162329 2025-09-07T06:13:37.3780961Z * [new tag] ciflow/trunk/113258 -> ciflow/trunk/113258 2025-09-07T06:13:37.3781716Z * [new tag] ciflow/trunk/137400 -> ciflow/trunk/137400 2025-09-07T06:13:37.3782462Z * [new tag] ciflow/trunk/148180 -> ciflow/trunk/148180 2025-09-07T06:13:37.3783192Z * [new tag] ciflow/trunk/148328 -> ciflow/trunk/148328 2025-09-07T06:13:37.3783944Z * [new tag] ciflow/trunk/148492 -> ciflow/trunk/148492 2025-09-07T06:13:37.3785035Z * [new tag] ciflow/trunk/148919 -> ciflow/trunk/148919 2025-09-07T06:13:37.3785823Z * [new tag] ciflow/trunk/152624 -> ciflow/trunk/152624 2025-09-07T06:13:37.3786555Z * [new tag] ciflow/trunk/154170 -> ciflow/trunk/154170 2025-09-07T06:13:37.3787253Z * [new tag] ciflow/trunk/154694 -> ciflow/trunk/154694 2025-09-07T06:13:37.3787960Z * [new tag] ciflow/trunk/156049 -> ciflow/trunk/156049 2025-09-07T06:13:37.3788683Z * [new tag] ciflow/trunk/156703 -> ciflow/trunk/156703 2025-09-07T06:13:37.3789717Z * [new tag] ciflow/trunk/156711 -> ciflow/trunk/156711 2025-09-07T06:13:37.3790701Z * [new tag] ciflow/trunk/157432 -> ciflow/trunk/157432 2025-09-07T06:13:37.3791560Z * [new tag] ciflow/trunk/157685 -> ciflow/trunk/157685 2025-09-07T06:13:37.3792530Z * [new tag] ciflow/trunk/157689 -> ciflow/trunk/157689 2025-09-07T06:13:37.3793368Z * [new tag] ciflow/trunk/157699 -> ciflow/trunk/157699 2025-09-07T06:13:37.3794150Z * [new tag] ciflow/trunk/157813 -> ciflow/trunk/157813 2025-09-07T06:13:37.3794904Z * [new tag] ciflow/trunk/157994 -> ciflow/trunk/157994 2025-09-07T06:13:37.3795877Z * [new tag] ciflow/trunk/158091 -> ciflow/trunk/158091 2025-09-07T06:13:37.3796819Z * [new tag] ciflow/trunk/158104 -> ciflow/trunk/158104 2025-09-07T06:13:37.3797545Z * [new tag] ciflow/trunk/158404 -> ciflow/trunk/158404 2025-09-07T06:13:37.3798366Z * [new tag] ciflow/trunk/158647 -> ciflow/trunk/158647 2025-09-07T06:13:37.3799397Z * [new tag] ciflow/trunk/158846 -> ciflow/trunk/158846 2025-09-07T06:13:37.3800094Z * [new tag] ciflow/trunk/159158 -> ciflow/trunk/159158 2025-09-07T06:13:37.3801048Z * [new tag] ciflow/trunk/159682 -> ciflow/trunk/159682 2025-09-07T06:13:37.3801894Z * [new tag] ciflow/trunk/159835 -> ciflow/trunk/159835 2025-09-07T06:13:37.3802512Z * [new tag] ciflow/trunk/160161 -> ciflow/trunk/160161 2025-09-07T06:13:37.3803265Z * [new tag] ciflow/trunk/160236 -> ciflow/trunk/160236 2025-09-07T06:13:37.3804008Z * [new tag] ciflow/trunk/160329 -> ciflow/trunk/160329 2025-09-07T06:13:37.3804800Z * [new tag] ciflow/trunk/160480 -> ciflow/trunk/160480 2025-09-07T06:13:37.3805549Z * [new tag] ciflow/trunk/160532 -> ciflow/trunk/160532 2025-09-07T06:13:37.3806306Z * [new tag] ciflow/trunk/160836 -> ciflow/trunk/160836 2025-09-07T06:13:37.3807069Z * [new tag] ciflow/trunk/160843 -> ciflow/trunk/160843 2025-09-07T06:13:37.3807817Z * [new tag] ciflow/trunk/160869 -> ciflow/trunk/160869 2025-09-07T06:13:37.3808814Z * [new tag] ciflow/trunk/160940 -> ciflow/trunk/160940 2025-09-07T06:13:37.3809544Z * [new tag] ciflow/trunk/160943 -> ciflow/trunk/160943 2025-09-07T06:13:37.3810522Z * [new tag] ciflow/trunk/160953 -> ciflow/trunk/160953 2025-09-07T06:13:37.3811904Z * [new tag] ciflow/trunk/161035 -> ciflow/trunk/161035 2025-09-07T06:13:37.3812696Z * [new tag] ciflow/trunk/161178 -> ciflow/trunk/161178 2025-09-07T06:13:37.3813706Z * [new tag] ciflow/trunk/161349 -> ciflow/trunk/161349 2025-09-07T06:13:37.3814498Z * [new tag] ciflow/trunk/161350 -> ciflow/trunk/161350 2025-09-07T06:13:37.3815285Z * [new tag] ciflow/trunk/161351 -> ciflow/trunk/161351 2025-09-07T06:13:37.3816042Z * [new tag] ciflow/trunk/161395 -> ciflow/trunk/161395 2025-09-07T06:13:37.3816852Z * [new tag] ciflow/trunk/161405 -> ciflow/trunk/161405 2025-09-07T06:13:37.3817607Z * [new tag] ciflow/trunk/161406 -> ciflow/trunk/161406 2025-09-07T06:13:37.3818357Z * [new tag] ciflow/trunk/161410 -> ciflow/trunk/161410 2025-09-07T06:13:37.3819137Z * [new tag] ciflow/trunk/161468 -> ciflow/trunk/161468 2025-09-07T06:13:37.3819960Z * [new tag] ciflow/trunk/161499 -> ciflow/trunk/161499 2025-09-07T06:13:37.3821075Z * [new tag] ciflow/trunk/161527 -> ciflow/trunk/161527 2025-09-07T06:13:37.3821811Z * [new tag] ciflow/trunk/161534 -> ciflow/trunk/161534 2025-09-07T06:13:37.3822689Z * [new tag] ciflow/trunk/161591 -> ciflow/trunk/161591 2025-09-07T06:13:37.3823496Z * [new tag] ciflow/trunk/161595 -> ciflow/trunk/161595 2025-09-07T06:13:37.3824255Z * [new tag] ciflow/trunk/161596 -> ciflow/trunk/161596 2025-09-07T06:13:37.3825136Z * [new tag] ciflow/trunk/161633 -> ciflow/trunk/161633 2025-09-07T06:13:37.3825887Z * [new tag] ciflow/trunk/161634 -> ciflow/trunk/161634 2025-09-07T06:13:37.3826697Z * [new tag] ciflow/trunk/161635 -> ciflow/trunk/161635 2025-09-07T06:13:37.3827413Z * [new tag] ciflow/trunk/161667 -> ciflow/trunk/161667 2025-09-07T06:13:37.3828167Z * [new tag] ciflow/trunk/161670 -> ciflow/trunk/161670 2025-09-07T06:13:37.3828930Z * [new tag] ciflow/trunk/161692 -> ciflow/trunk/161692 2025-09-07T06:13:37.3829712Z * [new tag] ciflow/trunk/161693 -> ciflow/trunk/161693 2025-09-07T06:13:37.3830495Z * [new tag] ciflow/trunk/161695 -> ciflow/trunk/161695 2025-09-07T06:13:37.3831276Z * [new tag] ciflow/trunk/161730 -> ciflow/trunk/161730 2025-09-07T06:13:37.3832021Z * [new tag] ciflow/trunk/161744 -> ciflow/trunk/161744 2025-09-07T06:13:37.3832870Z * [new tag] ciflow/trunk/161749 -> ciflow/trunk/161749 2025-09-07T06:13:37.3833540Z * [new tag] ciflow/trunk/161881 -> ciflow/trunk/161881 2025-09-07T06:13:37.3834282Z * [new tag] ciflow/trunk/161924 -> ciflow/trunk/161924 2025-09-07T06:13:37.3835355Z * [new tag] ciflow/trunk/161926 -> ciflow/trunk/161926 2025-09-07T06:13:37.3836089Z * [new tag] ciflow/trunk/161936 -> ciflow/trunk/161936 2025-09-07T06:13:37.3836810Z * [new tag] ciflow/trunk/161952 -> ciflow/trunk/161952 2025-09-07T06:13:37.3837611Z * [new tag] ciflow/trunk/161955 -> ciflow/trunk/161955 2025-09-07T06:13:37.3838327Z * [new tag] ciflow/trunk/161957 -> ciflow/trunk/161957 2025-09-07T06:13:37.3839116Z * [new tag] ciflow/trunk/161959 -> ciflow/trunk/161959 2025-09-07T06:13:37.3839867Z * [new tag] ciflow/trunk/161977 -> ciflow/trunk/161977 2025-09-07T06:13:37.3840602Z * [new tag] ciflow/trunk/161988 -> ciflow/trunk/161988 2025-09-07T06:13:37.3841358Z * [new tag] ciflow/trunk/161994 -> ciflow/trunk/161994 2025-09-07T06:13:37.3842393Z * [new tag] ciflow/trunk/162007 -> ciflow/trunk/162007 2025-09-07T06:13:37.3843033Z * [new tag] ciflow/trunk/162013 -> ciflow/trunk/162013 2025-09-07T06:13:37.3843787Z * [new tag] ciflow/trunk/162017 -> ciflow/trunk/162017 2025-09-07T06:13:37.3844539Z * [new tag] ciflow/trunk/162021 -> ciflow/trunk/162021 2025-09-07T06:13:37.3845308Z * [new tag] ciflow/trunk/162022 -> ciflow/trunk/162022 2025-09-07T06:13:37.3846046Z * [new tag] ciflow/trunk/162040 -> ciflow/trunk/162040 2025-09-07T06:13:37.3846832Z * [new tag] ciflow/trunk/162041 -> ciflow/trunk/162041 2025-09-07T06:13:37.3847880Z * [new tag] ciflow/trunk/162062 -> ciflow/trunk/162062 2025-09-07T06:13:37.3848619Z * [new tag] ciflow/trunk/162066 -> ciflow/trunk/162066 2025-09-07T06:13:37.3849364Z * [new tag] ciflow/trunk/162089 -> ciflow/trunk/162089 2025-09-07T06:13:37.3850232Z * [new tag] ciflow/trunk/162099 -> ciflow/trunk/162099 2025-09-07T06:13:37.3850980Z * [new tag] ciflow/trunk/162104 -> ciflow/trunk/162104 2025-09-07T06:13:37.3851738Z * [new tag] ciflow/trunk/162106 -> ciflow/trunk/162106 2025-09-07T06:13:37.3852438Z * [new tag] ciflow/trunk/162112 -> ciflow/trunk/162112 2025-09-07T06:13:37.3853655Z * [new tag] ciflow/trunk/162119 -> ciflow/trunk/162119 2025-09-07T06:13:37.3854370Z * [new tag] ciflow/trunk/162142 -> ciflow/trunk/162142 2025-09-07T06:13:37.3855148Z * [new tag] ciflow/trunk/162169 -> ciflow/trunk/162169 2025-09-07T06:13:37.3855933Z * [new tag] ciflow/trunk/162183 -> ciflow/trunk/162183 2025-09-07T06:13:37.3856695Z * [new tag] ciflow/trunk/162190 -> ciflow/trunk/162190 2025-09-07T06:13:37.3857465Z * [new tag] ciflow/trunk/162194 -> ciflow/trunk/162194 2025-09-07T06:13:37.3858257Z * [new tag] ciflow/trunk/162200 -> ciflow/trunk/162200 2025-09-07T06:13:37.3859023Z * [new tag] ciflow/trunk/162206 -> ciflow/trunk/162206 2025-09-07T06:13:37.3859794Z * [new tag] ciflow/trunk/162208 -> ciflow/trunk/162208 2025-09-07T06:13:37.3860628Z * [new tag] ciflow/trunk/162222 -> ciflow/trunk/162222 2025-09-07T06:13:37.3861407Z * [new tag] ciflow/trunk/162238 -> ciflow/trunk/162238 2025-09-07T06:13:37.3862430Z * [new tag] ciflow/trunk/162244 -> ciflow/trunk/162244 2025-09-07T06:13:37.3863787Z * [new tag] ciflow/trunk/162267 -> ciflow/trunk/162267 2025-09-07T06:13:37.3864616Z * [new tag] ciflow/trunk/162269 -> ciflow/trunk/162269 2025-09-07T06:13:37.3865610Z * [new tag] ciflow/trunk/162278 -> ciflow/trunk/162278 2025-09-07T06:13:37.3866392Z * [new tag] ciflow/trunk/162286 -> ciflow/trunk/162286 2025-09-07T06:13:37.3867137Z * [new tag] ciflow/trunk/162288 -> ciflow/trunk/162288 2025-09-07T06:13:37.3867881Z * [new tag] ciflow/trunk/162293 -> ciflow/trunk/162293 2025-09-07T06:13:37.3868627Z * [new tag] ciflow/trunk/162310 -> ciflow/trunk/162310 2025-09-07T06:13:37.3869347Z * [new tag] ciflow/trunk/162311 -> ciflow/trunk/162311 2025-09-07T06:13:37.3870083Z * [new tag] ciflow/trunk/162315 -> ciflow/trunk/162315 2025-09-07T06:13:37.3870817Z * [new tag] ciflow/trunk/162325 -> ciflow/trunk/162325 2025-09-07T06:13:37.3871764Z * [new tag] ciflow/trunk/162328 -> ciflow/trunk/162328 2025-09-07T06:13:37.3872469Z * [new tag] ciflow/trunk/162329 -> ciflow/trunk/162329 2025-09-07T06:13:37.3873744Z * [new tag] ciflow/unstable/123 -> ciflow/unstable/123 2025-09-07T06:13:37.3874572Z * [new tag] ciflow/vllm/162292 -> ciflow/vllm/162292 2025-09-07T06:13:37.3875510Z * [new tag] ciflow/win-arm64/156049 -> ciflow/win-arm64/156049 2025-09-07T06:13:37.3876162Z * [new tag] ciflow/win-arm64/158104 -> ciflow/win-arm64/158104 2025-09-07T06:13:37.3876988Z * [new tag] ciflow/xpu/157699 -> ciflow/xpu/157699 2025-09-07T06:13:37.3877677Z * [new tag] ciflow/xpu/157994 -> ciflow/xpu/157994 2025-09-07T06:13:37.3878662Z * [new tag] ciflow/xpu/159459 -> ciflow/xpu/159459 2025-09-07T06:13:37.3879331Z * [new tag] ciflow/xpu/159718 -> ciflow/xpu/159718 2025-09-07T06:13:37.3880014Z * [new tag] ciflow/xpu/159944 -> ciflow/xpu/159944 2025-09-07T06:13:37.3880819Z * [new tag] ciflow/xpu/160867 -> ciflow/xpu/160867 2025-09-07T06:13:37.3881636Z * [new tag] ciflow/xpu/160938 -> ciflow/xpu/160938 2025-09-07T06:13:37.3882318Z * [new tag] ciflow/xpu/160940 -> ciflow/xpu/160940 2025-09-07T06:13:37.3883006Z * [new tag] ciflow/xpu/160953 -> ciflow/xpu/160953 2025-09-07T06:13:37.3883878Z * [new tag] ciflow/xpu/161045 -> ciflow/xpu/161045 2025-09-07T06:13:37.3884815Z * [new tag] ciflow/xpu/161058 -> ciflow/xpu/161058 2025-09-07T06:13:37.3885811Z * [new tag] ciflow/xpu/161246 -> ciflow/xpu/161246 2025-09-07T06:13:37.3886859Z * [new tag] ciflow/xpu/161397 -> ciflow/xpu/161397 2025-09-07T06:13:37.3887762Z * [new tag] ciflow/xpu/161485 -> ciflow/xpu/161485 2025-09-07T06:13:37.3888439Z * [new tag] ciflow/xpu/161988 -> ciflow/xpu/161988 2025-09-07T06:13:37.3889184Z * [new tag] ciflow/xpu/162062 -> ciflow/xpu/162062 2025-09-07T06:13:37.3890088Z * [new tag] cslpull75 -> cslpull75 2025-09-07T06:13:37.3890875Z * [new tag] cslpull76 -> cslpull76 2025-09-07T06:13:37.3891791Z * [new tag] cslpull77 -> cslpull77 2025-09-07T06:13:37.3893025Z * [new tag] cslpull78 -> cslpull78 2025-09-07T06:13:37.3894293Z * [new tag] cslpull79 -> cslpull79 2025-09-07T06:13:37.3895531Z * [new tag] cslpull80 -> cslpull80 2025-09-07T06:13:37.3896605Z * [new tag] cslpull81 -> cslpull81 2025-09-07T06:13:37.3897373Z * [new tag] cslpull82 -> cslpull82 2025-09-07T06:13:37.3898362Z * [new tag] cslpull83 -> cslpull83 2025-09-07T06:13:37.3899331Z * [new tag] cslpull84 -> cslpull84 2025-09-07T06:13:37.3900176Z * [new tag] cslpull85 -> cslpull85 2025-09-07T06:13:37.3901204Z * [new tag] cslpull86 -> cslpull86 2025-09-07T06:13:37.3902149Z * [new tag] cslpull87 -> cslpull87 2025-09-07T06:13:37.3903108Z * [new tag] cslpull88 -> cslpull88 2025-09-07T06:13:37.3903953Z * [new tag] cslpull89 -> cslpull89 2025-09-07T06:13:37.3904803Z * [new tag] cslpull90 -> cslpull90 2025-09-07T06:13:37.3906242Z * [new tag] cslpull91 -> cslpull91 2025-09-07T06:13:37.3907024Z * [new tag] cslpull92 -> cslpull92 2025-09-07T06:13:37.3907927Z * [new tag] flight_5 -> flight_5 2025-09-07T06:13:37.3908946Z * [new tag] flight_5.1 -> flight_5.1 2025-09-07T06:13:37.3909823Z * [new tag] flight_5.2 -> flight_5.2 2025-09-07T06:13:37.3910601Z * [new tag] flight_5.3 -> flight_5.3 2025-09-07T06:13:37.3911501Z * [new tag] forpull1 -> forpull1 2025-09-07T06:13:37.3912535Z * [new tag] malfet/tag-2ef5611 -> malfet/tag-2ef5611 2025-09-07T06:13:37.3913505Z * [new tag] malfet/tag-317b1a0 -> malfet/tag-317b1a0 2025-09-07T06:13:37.3914480Z * [new tag] malfet/tag-ec6f767 -> malfet/tag-ec6f767 2025-09-07T06:13:37.3915430Z * [new tag] nightly-binary -> nightly-binary 2025-09-07T06:13:37.3916103Z * [new tag] sqzhang_flight4_plus -> sqzhang_flight4_plus 2025-09-07T06:13:37.3917092Z * [new tag] sqzhang_flight_3 -> sqzhang_flight_3 2025-09-07T06:13:37.3918396Z * [new tag] trunk/00636e0171e7e733628c408084805442270cf608 -> trunk/00636e0171e7e733628c408084805442270cf608 2025-09-07T06:13:37.3919265Z * [new tag] trunk/019fed39aa6b2dd8c69347378d53423e5efae8d4 -> trunk/019fed39aa6b2dd8c69347378d53423e5efae8d4 2025-09-07T06:13:37.3920472Z * [new tag] trunk/01ab325cc2e0dc221af4d710974e1b9175066544 -> trunk/01ab325cc2e0dc221af4d710974e1b9175066544 2025-09-07T06:13:37.3921526Z * [new tag] trunk/01edcd4df8bf0c7b4cc2d3ec868bd2059eeea83b -> trunk/01edcd4df8bf0c7b4cc2d3ec868bd2059eeea83b 2025-09-07T06:13:37.3922472Z * [new tag] trunk/040d00af048967dde7938d358d7f5988cbd18388 -> trunk/040d00af048967dde7938d358d7f5988cbd18388 2025-09-07T06:13:37.3923426Z * [new tag] trunk/0447f2d99b4351b2ff129dce6eebb371024f73e5 -> trunk/0447f2d99b4351b2ff129dce6eebb371024f73e5 2025-09-07T06:13:37.3924373Z * [new tag] trunk/047603d35bdc70046216384838d6340feab79bf4 -> trunk/047603d35bdc70046216384838d6340feab79bf4 2025-09-07T06:13:37.3925329Z * [new tag] trunk/06da7c0730b3764f178ec3a90dedf4ffa4202d81 -> trunk/06da7c0730b3764f178ec3a90dedf4ffa4202d81 2025-09-07T06:13:37.3926353Z * [new tag] trunk/081cab045472ce045634548cc6c14a4870641e23 -> trunk/081cab045472ce045634548cc6c14a4870641e23 2025-09-07T06:13:37.3927243Z * [new tag] trunk/09587daf8c9f21f5340f73921ce5f23d1a4a4572 -> trunk/09587daf8c9f21f5340f73921ce5f23d1a4a4572 2025-09-07T06:13:37.3928228Z * [new tag] trunk/09be1890d72cc34fc946965dc4a27736bf0ca8c6 -> trunk/09be1890d72cc34fc946965dc4a27736bf0ca8c6 2025-09-07T06:13:37.3929156Z * [new tag] trunk/09d2f1b6315d6d416fbf452793d65795863ebc66 -> trunk/09d2f1b6315d6d416fbf452793d65795863ebc66 2025-09-07T06:13:37.3930019Z * [new tag] trunk/0af70e2353e1dcda83175fd4834ecb7b63e009e0 -> trunk/0af70e2353e1dcda83175fd4834ecb7b63e009e0 2025-09-07T06:13:37.3931637Z * [new tag] trunk/0c0e056a9e20c17271a6144dd32c0c7e3ba26736 -> trunk/0c0e056a9e20c17271a6144dd32c0c7e3ba26736 2025-09-07T06:13:37.3932501Z * [new tag] trunk/0cd6c56bdfa9178ff61be82ce3b178926ddb64a9 -> trunk/0cd6c56bdfa9178ff61be82ce3b178926ddb64a9 2025-09-07T06:13:37.3933765Z * [new tag] trunk/0d421ace32c1605ee8e452ee1eeb03bd243dd96c -> trunk/0d421ace32c1605ee8e452ee1eeb03bd243dd96c 2025-09-07T06:13:37.3934871Z * [new tag] trunk/0d71a9dd5b4b6d1dde58d91c9b71d96bc6a6a171 -> trunk/0d71a9dd5b4b6d1dde58d91c9b71d96bc6a6a171 2025-09-07T06:13:37.3935784Z * [new tag] trunk/0d84ff3b78f55492d3d4708458c92d776274939e -> trunk/0d84ff3b78f55492d3d4708458c92d776274939e 2025-09-07T06:13:37.3936718Z * [new tag] trunk/0f45aaf4414048b17d720d0915ce221a8de8ec63 -> trunk/0f45aaf4414048b17d720d0915ce221a8de8ec63 2025-09-07T06:13:37.3937683Z * [new tag] trunk/0ff8eabf1387de5acd6712a03bda61f1a3dfa27f -> trunk/0ff8eabf1387de5acd6712a03bda61f1a3dfa27f 2025-09-07T06:13:37.3938590Z * [new tag] trunk/104f2680e03d13a4765ca69f905d8f16fc0c822f -> trunk/104f2680e03d13a4765ca69f905d8f16fc0c822f 2025-09-07T06:13:37.3939582Z * [new tag] trunk/12814701555d3e41dfcdf8f9273af5821e322df0 -> trunk/12814701555d3e41dfcdf8f9273af5821e322df0 2025-09-07T06:13:37.3940538Z * [new tag] trunk/13b65196db422bdb394cb482e208c61ed448898c -> trunk/13b65196db422bdb394cb482e208c61ed448898c 2025-09-07T06:13:37.3941473Z * [new tag] trunk/13d66e2a66eceed14b8a8f5a971087df4f688a46 -> trunk/13d66e2a66eceed14b8a8f5a971087df4f688a46 2025-09-07T06:13:37.3942443Z * [new tag] trunk/145a3a7bda15e3963a33eb1b54bba5d4a270b225 -> trunk/145a3a7bda15e3963a33eb1b54bba5d4a270b225 2025-09-07T06:13:37.3943371Z * [new tag] trunk/146371483318e17929daefd37c8e459d9d6d47bb -> trunk/146371483318e17929daefd37c8e459d9d6d47bb 2025-09-07T06:13:37.3944376Z * [new tag] trunk/15c77a8cfd341e74fd124b077492ef2bfa51b339 -> trunk/15c77a8cfd341e74fd124b077492ef2bfa51b339 2025-09-07T06:13:37.3945438Z * [new tag] trunk/17fa8eec4a1e32939ab4d364ee6e75487a79b654 -> trunk/17fa8eec4a1e32939ab4d364ee6e75487a79b654 2025-09-07T06:13:37.3947070Z * [new tag] trunk/190c391a28845a14df26abb228d26aa813efb20c -> trunk/190c391a28845a14df26abb228d26aa813efb20c 2025-09-07T06:13:37.3948087Z * [new tag] trunk/1a588ace4667bde1331fbd8ed957157dca5cee68 -> trunk/1a588ace4667bde1331fbd8ed957157dca5cee68 2025-09-07T06:13:37.3949008Z * [new tag] trunk/1aa7476885e8f6e7b0ec3a5b6383aad9d3f343e7 -> trunk/1aa7476885e8f6e7b0ec3a5b6383aad9d3f343e7 2025-09-07T06:13:37.3949790Z * [new tag] trunk/1aeb421c342c9e9607842f4c87cb46e8e816ee53 -> trunk/1aeb421c342c9e9607842f4c87cb46e8e816ee53 2025-09-07T06:13:37.3950663Z * [new tag] trunk/1c1b28d5b6a942fafe23b2f09302d93c25226d4a -> trunk/1c1b28d5b6a942fafe23b2f09302d93c25226d4a 2025-09-07T06:13:37.3951572Z * [new tag] trunk/1ebd70d0c0d562d3be9abdee2a21906584af7d99 -> trunk/1ebd70d0c0d562d3be9abdee2a21906584af7d99 2025-09-07T06:13:37.3952471Z * [new tag] trunk/1ec2c15914da4ef7bd926ed9aebc8671c75fe965 -> trunk/1ec2c15914da4ef7bd926ed9aebc8671c75fe965 2025-09-07T06:13:37.3953368Z * [new tag] trunk/1f51056bd64e73d1aa81321bc3c098575b1bc78a -> trunk/1f51056bd64e73d1aa81321bc3c098575b1bc78a 2025-09-07T06:13:37.3954319Z * [new tag] trunk/1f820de639c75a1562d3fb03f160439f853ae07b -> trunk/1f820de639c75a1562d3fb03f160439f853ae07b 2025-09-07T06:13:37.3955209Z * [new tag] trunk/204697f0e695d82894c5010fbec664c4391f90cc -> trunk/204697f0e695d82894c5010fbec664c4391f90cc 2025-09-07T06:13:37.3956138Z * [new tag] trunk/20629b1619fe636227d01fc85ba221daa7185a05 -> trunk/20629b1619fe636227d01fc85ba221daa7185a05 2025-09-07T06:13:37.3956946Z * [new tag] trunk/20b47acef845e9c4f71da9429a396d293f50ebe7 -> trunk/20b47acef845e9c4f71da9429a396d293f50ebe7 2025-09-07T06:13:37.3957824Z * [new tag] trunk/20bfb2539d7c5250379648eda35f80b8a7d642dd -> trunk/20bfb2539d7c5250379648eda35f80b8a7d642dd 2025-09-07T06:13:37.3958774Z * [new tag] trunk/21fae99c180d17def562797ea0fb154d8fdf88e3 -> trunk/21fae99c180d17def562797ea0fb154d8fdf88e3 2025-09-07T06:13:37.3959784Z * [new tag] trunk/248355faf53f9f7ba2fd0a367d59600c6d991e7f -> trunk/248355faf53f9f7ba2fd0a367d59600c6d991e7f 2025-09-07T06:13:37.3961126Z * [new tag] trunk/25f4aaed9ec26f39c13862323ff8582006473d23 -> trunk/25f4aaed9ec26f39c13862323ff8582006473d23 2025-09-07T06:13:37.3962020Z * [new tag] trunk/261a84a1764412f8e659c956e3f81997ec3de9d5 -> trunk/261a84a1764412f8e659c956e3f81997ec3de9d5 2025-09-07T06:13:37.3963026Z * [new tag] trunk/28f4ab0737937858730f29f5c4e601e109cf9d5f -> trunk/28f4ab0737937858730f29f5c4e601e109cf9d5f 2025-09-07T06:13:37.3963954Z * [new tag] trunk/291cd11f2d5df6f48d348cce0e4e762f274f4dc4 -> trunk/291cd11f2d5df6f48d348cce0e4e762f274f4dc4 2025-09-07T06:13:37.3964861Z * [new tag] trunk/29280864d941e6108ab57f7298f520c0cf9696e9 -> trunk/29280864d941e6108ab57f7298f520c0cf9696e9 2025-09-07T06:13:37.3965783Z * [new tag] trunk/2a45837e98c63cae9d1a2e2133a727b829e549d5 -> trunk/2a45837e98c63cae9d1a2e2133a727b829e549d5 2025-09-07T06:13:37.3966705Z * [new tag] trunk/2a5c0785e2f975697fd7bdf1411de6e03dcaa1ef -> trunk/2a5c0785e2f975697fd7bdf1411de6e03dcaa1ef 2025-09-07T06:13:37.3967750Z * [new tag] trunk/2b8a83901c58a0858ea9e4ce00055f48e6ed164c -> trunk/2b8a83901c58a0858ea9e4ce00055f48e6ed164c 2025-09-07T06:13:37.3969864Z * [new tag] trunk/2ba65472dd54488a86a50326ea990195fc6732d6 -> trunk/2ba65472dd54488a86a50326ea990195fc6732d6 2025-09-07T06:13:37.3970322Z * [new tag] trunk/2c03f0acc53ed13fe8ebfe809129f25996e009a0 -> trunk/2c03f0acc53ed13fe8ebfe809129f25996e009a0 2025-09-07T06:13:37.3971599Z * [new tag] trunk/2dd529df0092799f68ee7afcf52338276906706a -> trunk/2dd529df0092799f68ee7afcf52338276906706a 2025-09-07T06:13:37.3972064Z * [new tag] trunk/2f6b4b1ad3f82bb3bd984f6e65744ea339ffb8b5 -> trunk/2f6b4b1ad3f82bb3bd984f6e65744ea339ffb8b5 2025-09-07T06:13:37.3972874Z * [new tag] trunk/2fa0520a64ed8aa734a56c4d124958f0b5711ca8 -> trunk/2fa0520a64ed8aa734a56c4d124958f0b5711ca8 2025-09-07T06:13:37.3973651Z * [new tag] trunk/302df2ac5dc4222294c09d48804a2dddb8f4bad8 -> trunk/302df2ac5dc4222294c09d48804a2dddb8f4bad8 2025-09-07T06:13:37.3974758Z * [new tag] trunk/33028597bfa2e0178e28c8cce33cb9b3800cac43 -> trunk/33028597bfa2e0178e28c8cce33cb9b3800cac43 2025-09-07T06:13:37.3975694Z * [new tag] trunk/34aa78274d6770086025a967fa63a86830e08176 -> trunk/34aa78274d6770086025a967fa63a86830e08176 2025-09-07T06:13:37.3976622Z * [new tag] trunk/3559c354ce6a14d11fe29fb12fa2747a2f2af449 -> trunk/3559c354ce6a14d11fe29fb12fa2747a2f2af449 2025-09-07T06:13:37.3977416Z * [new tag] trunk/36d207fcaaede0d1e58a5168084c307b32b6fd8b -> trunk/36d207fcaaede0d1e58a5168084c307b32b6fd8b 2025-09-07T06:13:37.3978235Z * [new tag] trunk/377033757ae5ca524ea842f1b0a5f446ed3d8fe0 -> trunk/377033757ae5ca524ea842f1b0a5f446ed3d8fe0 2025-09-07T06:13:37.3979170Z * [new tag] trunk/3771380f83fcac154a7c89ad679311d8c4818287 -> trunk/3771380f83fcac154a7c89ad679311d8c4818287 2025-09-07T06:13:37.3980121Z * [new tag] trunk/3a207816cc569f78863d86c01f2a3d265350e39f -> trunk/3a207816cc569f78863d86c01f2a3d265350e39f 2025-09-07T06:13:37.3981244Z * [new tag] trunk/3a20a20e7065ec927fdd216d4da3b04f879b3c67 -> trunk/3a20a20e7065ec927fdd216d4da3b04f879b3c67 2025-09-07T06:13:37.3982094Z * [new tag] trunk/3bbc2e3e4f025523eaa5dbff220b3e96bca608d0 -> trunk/3bbc2e3e4f025523eaa5dbff220b3e96bca608d0 2025-09-07T06:13:37.3983053Z * [new tag] trunk/3c0ff1b569c45cfa6935ad8031a9d4cf1551aa3f -> trunk/3c0ff1b569c45cfa6935ad8031a9d4cf1551aa3f 2025-09-07T06:13:37.3983998Z * [new tag] trunk/3c45af079afc92a03b03ddf4f9198902ffcf30cf -> trunk/3c45af079afc92a03b03ddf4f9198902ffcf30cf 2025-09-07T06:13:37.3985048Z * [new tag] trunk/3dde5d7f9bf80dd6623a712bc429e9e4302464b5 -> trunk/3dde5d7f9bf80dd6623a712bc429e9e4302464b5 2025-09-07T06:13:37.3985990Z * [new tag] trunk/403a3a393cda7e60f503f3b04b8805a845dcf45d -> trunk/403a3a393cda7e60f503f3b04b8805a845dcf45d 2025-09-07T06:13:37.3986984Z * [new tag] trunk/420c52ecf36f86d32da0853bfbe074b682b070aa -> trunk/420c52ecf36f86d32da0853bfbe074b682b070aa 2025-09-07T06:13:37.3987884Z * [new tag] trunk/43b7c86a2c0f91320f5c5f4827b111edff06fdb6 -> trunk/43b7c86a2c0f91320f5c5f4827b111edff06fdb6 2025-09-07T06:13:37.3988739Z * [new tag] trunk/451ed931562ec8b46d1f7e6c266a68132a119336 -> trunk/451ed931562ec8b46d1f7e6c266a68132a119336 2025-09-07T06:13:37.3989703Z * [new tag] trunk/480c7391126656154318fabf1d57ebc01e196e63 -> trunk/480c7391126656154318fabf1d57ebc01e196e63 2025-09-07T06:13:37.3990863Z * [new tag] trunk/48bedd753da22634aa94fbafeb731e82025404f3 -> trunk/48bedd753da22634aa94fbafeb731e82025404f3 2025-09-07T06:13:37.3991585Z * [new tag] trunk/494878a11b79071ada0b98f34042d47155be6d1c -> trunk/494878a11b79071ada0b98f34042d47155be6d1c 2025-09-07T06:13:37.3992902Z * [new tag] trunk/4ae57d448c0a7d37e4cfd5c27d977fad2cef4051 -> trunk/4ae57d448c0a7d37e4cfd5c27d977fad2cef4051 2025-09-07T06:13:37.3993976Z * [new tag] trunk/4cdaf8265d86f984254b62052da8c26ef61ef1cf -> trunk/4cdaf8265d86f984254b62052da8c26ef61ef1cf 2025-09-07T06:13:37.3994788Z * [new tag] trunk/4d4abec80f03cd8fdefe1d9cb3a60d3690cd777e -> trunk/4d4abec80f03cd8fdefe1d9cb3a60d3690cd777e 2025-09-07T06:13:37.3995885Z * [new tag] trunk/4e42aa8ffc44b8340eb0eeaf80a2cafc4763a186 -> trunk/4e42aa8ffc44b8340eb0eeaf80a2cafc4763a186 2025-09-07T06:13:37.3996794Z * [new tag] trunk/4f72d932feee0749397fec876dcd43994f50b215 -> trunk/4f72d932feee0749397fec876dcd43994f50b215 2025-09-07T06:13:37.3997798Z * [new tag] trunk/50fc22dedf3c4a27be61fa05551c4f320281b42d -> trunk/50fc22dedf3c4a27be61fa05551c4f320281b42d 2025-09-07T06:13:37.3998779Z * [new tag] trunk/5211f1f908907ffc064b56e43cf8659f7fc22aa9 -> trunk/5211f1f908907ffc064b56e43cf8659f7fc22aa9 2025-09-07T06:13:37.3999774Z * [new tag] trunk/524b78d4f67045b83bb69edc56ab16efe282971c -> trunk/524b78d4f67045b83bb69edc56ab16efe282971c 2025-09-07T06:13:37.4000772Z * [new tag] trunk/54e275e0d81fe1e1ccfa4fb5f2a5a9aaca00ca15 -> trunk/54e275e0d81fe1e1ccfa4fb5f2a5a9aaca00ca15 2025-09-07T06:13:37.4001595Z * [new tag] trunk/5561e45758d59c94605873d5db48ed459c004c3b -> trunk/5561e45758d59c94605873d5db48ed459c004c3b 2025-09-07T06:13:37.4002817Z * [new tag] trunk/57278d45f046d4f89f45d373b1af4dd56934ff24 -> trunk/57278d45f046d4f89f45d373b1af4dd56934ff24 2025-09-07T06:13:37.4003728Z * [new tag] trunk/5927a70934ccf7b70182d364c23245a7dd685503 -> trunk/5927a70934ccf7b70182d364c23245a7dd685503 2025-09-07T06:13:37.4004767Z * [new tag] trunk/5985e28912aeb40b103ebfcf2fd0665eb4a50599 -> trunk/5985e28912aeb40b103ebfcf2fd0665eb4a50599 2025-09-07T06:13:37.4005769Z * [new tag] trunk/5a2da090ed6db88bb657c4e51ec0b310cd08bff6 -> trunk/5a2da090ed6db88bb657c4e51ec0b310cd08bff6 2025-09-07T06:13:37.4006836Z * [new tag] trunk/5c473e9f5ee0ef0fc38e6cf34a95b547f8cdc8d5 -> trunk/5c473e9f5ee0ef0fc38e6cf34a95b547f8cdc8d5 2025-09-07T06:13:37.4007601Z * [new tag] trunk/5c67426d6847667a7c55a2dd01f470fa37238c18 -> trunk/5c67426d6847667a7c55a2dd01f470fa37238c18 2025-09-07T06:13:37.4008551Z * [new tag] trunk/5da573c42c332bc68d4b7946c69f690a876d951a -> trunk/5da573c42c332bc68d4b7946c69f690a876d951a 2025-09-07T06:13:37.4009453Z * [new tag] trunk/5e5870e858f60ff4bf87d03f3592097e934a9580 -> trunk/5e5870e858f60ff4bf87d03f3592097e934a9580 2025-09-07T06:13:37.4010402Z * [new tag] trunk/5f3cbc9442aa55b5afb29f4ac8ca9be569003e84 -> trunk/5f3cbc9442aa55b5afb29f4ac8ca9be569003e84 2025-09-07T06:13:37.4011329Z * [new tag] trunk/600c25e9a17fe56e3dee872be8854db08916ba0c -> trunk/600c25e9a17fe56e3dee872be8854db08916ba0c 2025-09-07T06:13:37.4012308Z * [new tag] trunk/601ae8e4831fc8123fffcfb8fd2e6b6381b42e14 -> trunk/601ae8e4831fc8123fffcfb8fd2e6b6381b42e14 2025-09-07T06:13:37.4013569Z * [new tag] trunk/6087ef41e54c2494b117ffd923faf20f515a6806 -> trunk/6087ef41e54c2494b117ffd923faf20f515a6806 2025-09-07T06:13:37.4014560Z * [new tag] trunk/626cb7df8161dd4ecb4fe43b60f37ce9076f56b1 -> trunk/626cb7df8161dd4ecb4fe43b60f37ce9076f56b1 2025-09-07T06:13:37.4015491Z * [new tag] trunk/62c3f9a97fd3dea7132a93066d32d893ffe101e6 -> trunk/62c3f9a97fd3dea7132a93066d32d893ffe101e6 2025-09-07T06:13:37.4016461Z * [new tag] trunk/63a9c23fe99eacfd09610c36dfe8f01b053c1a35 -> trunk/63a9c23fe99eacfd09610c36dfe8f01b053c1a35 2025-09-07T06:13:37.4017422Z * [new tag] trunk/65985937d97505f648b6ed852c3129f2dd08b251 -> trunk/65985937d97505f648b6ed852c3129f2dd08b251 2025-09-07T06:13:37.4019116Z * [new tag] trunk/66f3b4a682a6153517dd23369fdc3289b6494b07 -> trunk/66f3b4a682a6153517dd23369fdc3289b6494b07 2025-09-07T06:13:37.4019866Z * [new tag] trunk/6737e2c996990024187ba620d2764f3b6f6add2c -> trunk/6737e2c996990024187ba620d2764f3b6f6add2c 2025-09-07T06:13:37.4020856Z * [new tag] trunk/67c31dcd364f10072a55f4a30ffd1151c686283a -> trunk/67c31dcd364f10072a55f4a30ffd1151c686283a 2025-09-07T06:13:37.4021875Z * [new tag] trunk/68738beff73e9c3512e18b4edea811a897ce42db -> trunk/68738beff73e9c3512e18b4edea811a897ce42db 2025-09-07T06:13:37.4023079Z * [new tag] trunk/69a25f68884a168550695fdb1a7c310c54d29536 -> trunk/69a25f68884a168550695fdb1a7c310c54d29536 2025-09-07T06:13:37.4024361Z * [new tag] trunk/6b1900c22f1a07b9519346898d4c71d8a2b0f12f -> trunk/6b1900c22f1a07b9519346898d4c71d8a2b0f12f 2025-09-07T06:13:37.4025405Z * [new tag] trunk/6b8b3ac4403f771bd4a8f9a45d93347304148774 -> trunk/6b8b3ac4403f771bd4a8f9a45d93347304148774 2025-09-07T06:13:37.4026305Z * [new tag] trunk/6f7608d603834d6068b2e7a5d59bec3973b6bb1b -> trunk/6f7608d603834d6068b2e7a5d59bec3973b6bb1b 2025-09-07T06:13:37.4027298Z * [new tag] trunk/70d36e047dfb3488fd6335016711a784d810ebda -> trunk/70d36e047dfb3488fd6335016711a784d810ebda 2025-09-07T06:13:37.4028208Z * [new tag] trunk/71992dd805ff9d6763f77214dfe8b0465e88c87b -> trunk/71992dd805ff9d6763f77214dfe8b0465e88c87b 2025-09-07T06:13:37.4029104Z * [new tag] trunk/734ce8eba9c69381f187359bf0fef1d71d84cd20 -> trunk/734ce8eba9c69381f187359bf0fef1d71d84cd20 2025-09-07T06:13:37.4030085Z * [new tag] trunk/73eb4511fb863a37944342b7e92aae706de603c8 -> trunk/73eb4511fb863a37944342b7e92aae706de603c8 2025-09-07T06:13:37.4031050Z * [new tag] trunk/75bc23cfc345bd4c05e7f97c416c4b3d2d1fa64b -> trunk/75bc23cfc345bd4c05e7f97c416c4b3d2d1fa64b 2025-09-07T06:13:37.4031984Z * [new tag] trunk/771f369448321a387f2018535bc8b8b6e5f12fab -> trunk/771f369448321a387f2018535bc8b8b6e5f12fab 2025-09-07T06:13:37.4033001Z * [new tag] trunk/789d4942127143f2adcb53612c058ce4c9a2cf20 -> trunk/789d4942127143f2adcb53612c058ce4c9a2cf20 2025-09-07T06:13:37.4033832Z * [new tag] trunk/791eff96c85678c950888f9da24650083ee673fe -> trunk/791eff96c85678c950888f9da24650083ee673fe 2025-09-07T06:13:37.4034550Z * [new tag] trunk/793fc12aff1f69fbbf9f4278182fb52bbe350fc9 -> trunk/793fc12aff1f69fbbf9f4278182fb52bbe350fc9 2025-09-07T06:13:37.4035467Z * [new tag] trunk/79fcd5247a9a129eee526a14df30bfc6a22b3f01 -> trunk/79fcd5247a9a129eee526a14df30bfc6a22b3f01 2025-09-07T06:13:37.4036396Z * [new tag] trunk/7f4ff79210eb06924f223ae3a1941ee0e2635348 -> trunk/7f4ff79210eb06924f223ae3a1941ee0e2635348 2025-09-07T06:13:37.4037348Z * [new tag] trunk/8076a185c85112be62be292eb47409c88a585b1c -> trunk/8076a185c85112be62be292eb47409c88a585b1c 2025-09-07T06:13:37.4038240Z * [new tag] trunk/80dd397f1979371a5583fa3d5c7352029522a78d -> trunk/80dd397f1979371a5583fa3d5c7352029522a78d 2025-09-07T06:13:37.4039035Z * [new tag] trunk/8171d6052ec12628eb67e0040839314056014429 -> trunk/8171d6052ec12628eb67e0040839314056014429 2025-09-07T06:13:37.4039961Z * [new tag] trunk/81aeefa657b7ccc26b275c50a9f33b2f056e8071 -> trunk/81aeefa657b7ccc26b275c50a9f33b2f056e8071 2025-09-07T06:13:37.4040856Z * [new tag] trunk/81b7b16618bda250ce55982894a83dc0805eb64c -> trunk/81b7b16618bda250ce55982894a83dc0805eb64c 2025-09-07T06:13:37.4041816Z * [new tag] trunk/827f0d405448de31f79d1089f7d7fceab2f87895 -> trunk/827f0d405448de31f79d1089f7d7fceab2f87895 2025-09-07T06:13:37.4042796Z * [new tag] trunk/82f63c8f6de63c30132a8ac299b6e8c2fd0d3fe8 -> trunk/82f63c8f6de63c30132a8ac299b6e8c2fd0d3fe8 2025-09-07T06:13:37.4043704Z * [new tag] trunk/850e1382a9c56bfde18af09d3e72352d775e9435 -> trunk/850e1382a9c56bfde18af09d3e72352d775e9435 2025-09-07T06:13:37.4044876Z * [new tag] trunk/8678d831c48e616b717bff50f2d03141d2e9f965 -> trunk/8678d831c48e616b717bff50f2d03141d2e9f965 2025-09-07T06:13:37.4045776Z * [new tag] trunk/869cbcc16e489a4f5a14a93d5779b0ea86061c60 -> trunk/869cbcc16e489a4f5a14a93d5779b0ea86061c60 2025-09-07T06:13:37.4046774Z * [new tag] trunk/8703debf669bc2238211bfd039f4ecdd8228b7f7 -> trunk/8703debf669bc2238211bfd039f4ecdd8228b7f7 2025-09-07T06:13:37.4047730Z * [new tag] trunk/874069fbe46e82da5cfa405e6c0deb12e89ff608 -> trunk/874069fbe46e82da5cfa405e6c0deb12e89ff608 2025-09-07T06:13:37.4048905Z * [new tag] trunk/8875d6e394da2fffd04f31b28bf258c94d4776a3 -> trunk/8875d6e394da2fffd04f31b28bf258c94d4776a3 2025-09-07T06:13:37.4049830Z * [new tag] trunk/88d94d17e8c5155451393afa6eb3bab48ab61c16 -> trunk/88d94d17e8c5155451393afa6eb3bab48ab61c16 2025-09-07T06:13:37.4050845Z * [new tag] trunk/890626632def7e0ef95a2d01e87a0e4627824a9f -> trunk/890626632def7e0ef95a2d01e87a0e4627824a9f 2025-09-07T06:13:37.4051962Z * [new tag] trunk/8975cda2520b7b1b5bc3b4d8213edf261fa82570 -> trunk/8975cda2520b7b1b5bc3b4d8213edf261fa82570 2025-09-07T06:13:37.4052956Z * [new tag] trunk/89d41d3f61d04f14730ec26f008a59bef6624610 -> trunk/89d41d3f61d04f14730ec26f008a59bef6624610 2025-09-07T06:13:37.4054308Z * [new tag] trunk/8bb213b6d599ef1273fe52f9b1f6d476056c3a41 -> trunk/8bb213b6d599ef1273fe52f9b1f6d476056c3a41 2025-09-07T06:13:37.4055245Z * [new tag] trunk/8e23a1227b5fb2e39afaa7d57c075a75b640a5af -> trunk/8e23a1227b5fb2e39afaa7d57c075a75b640a5af 2025-09-07T06:13:37.4056760Z * [new tag] trunk/8ec551bb354ab2b85fbbba9d461740a20366d248 -> trunk/8ec551bb354ab2b85fbbba9d461740a20366d248 2025-09-07T06:13:37.4057729Z * [new tag] trunk/8fd3c9ce919c8d5c645fd348bba517e948cbc29d -> trunk/8fd3c9ce919c8d5c645fd348bba517e948cbc29d 2025-09-07T06:13:37.4059014Z * [new tag] trunk/90f50f7e68e120d9574e6e3189e37b4280010ad9 -> trunk/90f50f7e68e120d9574e6e3189e37b4280010ad9 2025-09-07T06:13:37.4060057Z * [new tag] trunk/91f0bcf43fc0bc743350d491ac63b77e92054ac9 -> trunk/91f0bcf43fc0bc743350d491ac63b77e92054ac9 2025-09-07T06:13:37.4061107Z * [new tag] trunk/92576a594b8121f6b0b1b5a3ea16d08792fc68ab -> trunk/92576a594b8121f6b0b1b5a3ea16d08792fc68ab 2025-09-07T06:13:37.4062080Z * [new tag] trunk/92a43025e0baa1f2ce345f28d22913b518a1ab9d -> trunk/92a43025e0baa1f2ce345f28d22913b518a1ab9d 2025-09-07T06:13:37.4062920Z * [new tag] trunk/93fb23d6fae7c4e82c4239a1033e522088742634 -> trunk/93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:13:37.4063855Z * [new tag] trunk/9458d1ac3bd70c2af316a8ba95d2c6c9c1199c9c -> trunk/9458d1ac3bd70c2af316a8ba95d2c6c9c1199c9c 2025-09-07T06:13:37.4064857Z * [new tag] trunk/9480cdc0b61488c89a23c2f64f43b2dcedc8728e -> trunk/9480cdc0b61488c89a23c2f64f43b2dcedc8728e 2025-09-07T06:13:37.4066102Z * [new tag] trunk/9491d289b329e4ba4a9f5f5b1be7960671bb7840 -> trunk/9491d289b329e4ba4a9f5f5b1be7960671bb7840 2025-09-07T06:13:37.4066973Z * [new tag] trunk/9499c8761cd2067feb9877414e818f6fd00290f1 -> trunk/9499c8761cd2067feb9877414e818f6fd00290f1 2025-09-07T06:13:37.4067932Z * [new tag] trunk/95ee0bfea99d3d346d6502b91b497d2b35795504 -> trunk/95ee0bfea99d3d346d6502b91b497d2b35795504 2025-09-07T06:13:37.4068889Z * [new tag] trunk/98374612fc2febd686be20761e56bdc2424bc36a -> trunk/98374612fc2febd686be20761e56bdc2424bc36a 2025-09-07T06:13:37.4070077Z * [new tag] trunk/98efc9e93d8fc61eb53cb91378443617cb550500 -> trunk/98efc9e93d8fc61eb53cb91378443617cb550500 2025-09-07T06:13:37.4070974Z * [new tag] trunk/994f2a5dbcbdc915da39bf6f6ce4d1f5e74835c9 -> trunk/994f2a5dbcbdc915da39bf6f6ce4d1f5e74835c9 2025-09-07T06:13:37.4071912Z * [new tag] trunk/99f356fa58c8d726cef022d8710f5491291158f6 -> trunk/99f356fa58c8d726cef022d8710f5491291158f6 2025-09-07T06:13:37.4072944Z * [new tag] trunk/9a1c5c0a078b94d13ac5c1ae0d754d19fb73bf99 -> trunk/9a1c5c0a078b94d13ac5c1ae0d754d19fb73bf99 2025-09-07T06:13:37.4074056Z * [new tag] trunk/9a665ca3c472384e9d722bddba79e5a7680f1abd -> trunk/9a665ca3c472384e9d722bddba79e5a7680f1abd 2025-09-07T06:13:37.4075080Z * [new tag] trunk/9aedb3cd87b52160872173c177f61053d97bed57 -> trunk/9aedb3cd87b52160872173c177f61053d97bed57 2025-09-07T06:13:37.4076050Z * [new tag] trunk/9b81fe281da41f2421506339d26b027a468902f4 -> trunk/9b81fe281da41f2421506339d26b027a468902f4 2025-09-07T06:13:37.4076995Z * [new tag] trunk/9bdcee01f86e2969cff1140cdecfca13cb51816e -> trunk/9bdcee01f86e2969cff1140cdecfca13cb51816e 2025-09-07T06:13:37.4077939Z * [new tag] trunk/9c03d6be87eedc06e524e202e07a7e776551a839 -> trunk/9c03d6be87eedc06e524e202e07a7e776551a839 2025-09-07T06:13:37.4078838Z * [new tag] trunk/9c957723a0fedd9c637e63e023a613019e2cab60 -> trunk/9c957723a0fedd9c637e63e023a613019e2cab60 2025-09-07T06:13:37.4079765Z * [new tag] trunk/9e5247f51d81735e5f1e65e80588985fa93bccc5 -> trunk/9e5247f51d81735e5f1e65e80588985fa93bccc5 2025-09-07T06:13:37.4080760Z * [new tag] trunk/9eadb37cdd699f7e8e8177a5227bfeb16184ef26 -> trunk/9eadb37cdd699f7e8e8177a5227bfeb16184ef26 2025-09-07T06:13:37.4081741Z * [new tag] trunk/a00cdc1e4159db73c9ffb3f25e93e55877709a29 -> trunk/a00cdc1e4159db73c9ffb3f25e93e55877709a29 2025-09-07T06:13:37.4082657Z * [new tag] trunk/a02ee4a816d11380c6f564c1aba64d56af5ba705 -> trunk/a02ee4a816d11380c6f564c1aba64d56af5ba705 2025-09-07T06:13:37.4083562Z * [new tag] trunk/a3c7f77e50f900721817934120d60c2361b3c40d -> trunk/a3c7f77e50f900721817934120d60c2361b3c40d 2025-09-07T06:13:37.4084478Z * [new tag] trunk/a3d72b09ae12126a2b7d4a63a45ac100a882a802 -> trunk/a3d72b09ae12126a2b7d4a63a45ac100a882a802 2025-09-07T06:13:37.4085447Z * [new tag] trunk/a3e5466002791da609fcb069155d8ee347baee92 -> trunk/a3e5466002791da609fcb069155d8ee347baee92 2025-09-07T06:13:37.4086628Z * [new tag] trunk/a714437093ed196eee28f7de454cf4c41badc098 -> trunk/a714437093ed196eee28f7de454cf4c41badc098 2025-09-07T06:13:37.4087957Z * [new tag] trunk/a75e8cd27098f290de0b7439685d05ce02e91356 -> trunk/a75e8cd27098f290de0b7439685d05ce02e91356 2025-09-07T06:13:37.4088722Z * [new tag] trunk/a8d6943d36c1c2a5f90d3573460695bad4b623ae -> trunk/a8d6943d36c1c2a5f90d3573460695bad4b623ae 2025-09-07T06:13:37.4089648Z * [new tag] trunk/a918bbad6ab20649ff82eefb48417ecbe96bcb34 -> trunk/a918bbad6ab20649ff82eefb48417ecbe96bcb34 2025-09-07T06:13:37.4090571Z * [new tag] trunk/a99d8d39bc842d6ebc3e368b178e4884d24b056e -> trunk/a99d8d39bc842d6ebc3e368b178e4884d24b056e 2025-09-07T06:13:37.4091519Z * [new tag] trunk/aac1a50a191b4102d566c9c1ea22f06d6c2e3f02 -> trunk/aac1a50a191b4102d566c9c1ea22f06d6c2e3f02 2025-09-07T06:13:37.4093007Z * [new tag] trunk/aad96a202244c7d0d120c04ba8db593edd8c0f92 -> trunk/aad96a202244c7d0d120c04ba8db593edd8c0f92 2025-09-07T06:13:37.4094295Z * [new tag] trunk/ab643e4dbbaf7b663d4237514cbf01af9b11565c -> trunk/ab643e4dbbaf7b663d4237514cbf01af9b11565c 2025-09-07T06:13:37.4095226Z * [new tag] trunk/abc447174cd2cf8591edbc70a9f836f9a5779f47 -> trunk/abc447174cd2cf8591edbc70a9f836f9a5779f47 2025-09-07T06:13:37.4096305Z * [new tag] trunk/acece97c3a9dceb63194e314da93fdf37cf15a0d -> trunk/acece97c3a9dceb63194e314da93fdf37cf15a0d 2025-09-07T06:13:37.4097464Z * [new tag] trunk/adae7f66aacf3f248c3101b858cf98d5809119fa -> trunk/adae7f66aacf3f248c3101b858cf98d5809119fa 2025-09-07T06:13:37.4098418Z * [new tag] trunk/ae0edc133e61e3b16caf0b2ee0ff3f33ab72af4c -> trunk/ae0edc133e61e3b16caf0b2ee0ff3f33ab72af4c 2025-09-07T06:13:37.4099400Z * [new tag] trunk/aed33a8fcbd60b052d4559d261390c5797129c6d -> trunk/aed33a8fcbd60b052d4559d261390c5797129c6d 2025-09-07T06:13:37.4100547Z * [new tag] trunk/b04e922712080a3652e438d05e8bb74e0cd2d238 -> trunk/b04e922712080a3652e438d05e8bb74e0cd2d238 2025-09-07T06:13:37.4101523Z * [new tag] trunk/b0a3e58dd71c1a039ac0ef51e5bd8f704f632f6f -> trunk/b0a3e58dd71c1a039ac0ef51e5bd8f704f632f6f 2025-09-07T06:13:37.4102526Z * [new tag] trunk/b16d3f4c8c01d461c2f01064e9ca5fa2b33f5cf1 -> trunk/b16d3f4c8c01d461c2f01064e9ca5fa2b33f5cf1 2025-09-07T06:13:37.4103470Z * [new tag] trunk/b18bb6796f210a183e687d9d64984a5a9d13cf09 -> trunk/b18bb6796f210a183e687d9d64984a5a9d13cf09 2025-09-07T06:13:37.4104583Z * [new tag] trunk/b1bb98ddebdd3e41bf7987372409bdce96ae55de -> trunk/b1bb98ddebdd3e41bf7987372409bdce96ae55de 2025-09-07T06:13:37.4105523Z * [new tag] trunk/b2b4add0e754411372060e1d7b4057a66439172b -> trunk/b2b4add0e754411372060e1d7b4057a66439172b 2025-09-07T06:13:37.4106544Z * [new tag] trunk/b2c7b9ad2dc5a7c0b61febd307761bd5bc2f0f05 -> trunk/b2c7b9ad2dc5a7c0b61febd307761bd5bc2f0f05 2025-09-07T06:13:37.4107491Z * [new tag] trunk/b40d9432be44a6b5974ee62e7d19c3c61c5ece37 -> trunk/b40d9432be44a6b5974ee62e7d19c3c61c5ece37 2025-09-07T06:13:37.4108481Z * [new tag] trunk/b4ad38279b178b7bd14355123c1101e2e853e77b -> trunk/b4ad38279b178b7bd14355123c1101e2e853e77b 2025-09-07T06:13:37.4109457Z * [new tag] trunk/b67c41039835bd9b20b83cd6233e86baaa5f5dde -> trunk/b67c41039835bd9b20b83cd6233e86baaa5f5dde 2025-09-07T06:13:37.4110669Z * [new tag] trunk/b6d0a9ea9056ede4f7024dbf3bd6c43be3aff49c -> trunk/b6d0a9ea9056ede4f7024dbf3bd6c43be3aff49c 2025-09-07T06:13:37.4111565Z * [new tag] trunk/b7dad7dd49448c88d0751fa2e29c70afe985f734 -> trunk/b7dad7dd49448c88d0751fa2e29c70afe985f734 2025-09-07T06:13:37.4112536Z * [new tag] trunk/b7e207ca9f046ddd716076965a0cce403ba99052 -> trunk/b7e207ca9f046ddd716076965a0cce403ba99052 2025-09-07T06:13:37.4113624Z * [new tag] trunk/b919560c4a7010e2d89facee25586269a994746e -> trunk/b919560c4a7010e2d89facee25586269a994746e 2025-09-07T06:13:37.4114600Z * [new tag] trunk/b9ba612f7a968f7b27e121ca8f4d0a4d954f5354 -> trunk/b9ba612f7a968f7b27e121ca8f4d0a4d954f5354 2025-09-07T06:13:37.4115752Z * [new tag] trunk/ba7f546ccccb5e0b36d9070dc25f26a9647f89f8 -> trunk/ba7f546ccccb5e0b36d9070dc25f26a9647f89f8 2025-09-07T06:13:37.4116734Z * [new tag] trunk/bb950284c7e72905994bc25dd436c10e48088d85 -> trunk/bb950284c7e72905994bc25dd436c10e48088d85 2025-09-07T06:13:37.4117686Z * [new tag] trunk/bbedc71fd3267c639c38b4ec25eaa22f973d9c4d -> trunk/bbedc71fd3267c639c38b4ec25eaa22f973d9c4d 2025-09-07T06:13:37.4118503Z * [new tag] trunk/bc4db2c27fce6ff1648bdc5af31ec225d2a31f37 -> trunk/bc4db2c27fce6ff1648bdc5af31ec225d2a31f37 2025-09-07T06:13:37.4119500Z * [new tag] trunk/bc505977fb66677a09c31155c987330fbb18a865 -> trunk/bc505977fb66677a09c31155c987330fbb18a865 2025-09-07T06:13:37.4120492Z * [new tag] trunk/bd39e47feea7326afb5bbb67fcb1e69279239527 -> trunk/bd39e47feea7326afb5bbb67fcb1e69279239527 2025-09-07T06:13:37.4121606Z * [new tag] trunk/be5b03dde96638f25ffd732a4fed7e41b4cf40e1 -> trunk/be5b03dde96638f25ffd732a4fed7e41b4cf40e1 2025-09-07T06:13:37.4122491Z * [new tag] trunk/bffc7dd1f374d8408911cd22c6b3d6df39ded9b3 -> trunk/bffc7dd1f374d8408911cd22c6b3d6df39ded9b3 2025-09-07T06:13:37.4123471Z * [new tag] trunk/c024b1f5a18d5c5aee5cc2acdd4c52b24b93ffcf -> trunk/c024b1f5a18d5c5aee5cc2acdd4c52b24b93ffcf 2025-09-07T06:13:37.4124394Z * [new tag] trunk/c0983e6cc0acf71689e1851d12609e00b3f59371 -> trunk/c0983e6cc0acf71689e1851d12609e00b3f59371 2025-09-07T06:13:37.4125551Z * [new tag] trunk/c10195e723eeeedd099ed8b73eda7184ca618fad -> trunk/c10195e723eeeedd099ed8b73eda7184ca618fad 2025-09-07T06:13:37.4126549Z * [new tag] trunk/c157cf6488ade6a7ee2ce2d25b059e1335630a99 -> trunk/c157cf6488ade6a7ee2ce2d25b059e1335630a99 2025-09-07T06:13:37.4127516Z * [new tag] trunk/c2a30246172fd71d56529907ffd3c27b76b1f3a7 -> trunk/c2a30246172fd71d56529907ffd3c27b76b1f3a7 2025-09-07T06:13:37.4128496Z * [new tag] trunk/c32111149921b48bfef909293f1049e21619ed76 -> trunk/c32111149921b48bfef909293f1049e21619ed76 2025-09-07T06:13:37.4129362Z * [new tag] trunk/c37103234afc832dcad307e9016230810957c9d5 -> trunk/c37103234afc832dcad307e9016230810957c9d5 2025-09-07T06:13:37.4130318Z * [new tag] trunk/c3ceca2995cd35e1376c4b0704669bff1a81e836 -> trunk/c3ceca2995cd35e1376c4b0704669bff1a81e836 2025-09-07T06:13:37.4131332Z * [new tag] trunk/c3d54dea9febb1236d48d19e5d4876a63f2e20fd -> trunk/c3d54dea9febb1236d48d19e5d4876a63f2e20fd 2025-09-07T06:13:37.4132291Z * [new tag] trunk/c465b3d52c5687fe910d35a5c75341b77f821741 -> trunk/c465b3d52c5687fe910d35a5c75341b77f821741 2025-09-07T06:13:37.4133552Z * [new tag] trunk/c5b8a10be5e89396da916d1069ffcb7135f0372b -> trunk/c5b8a10be5e89396da916d1069ffcb7135f0372b 2025-09-07T06:13:37.4134499Z * [new tag] trunk/c7e41071a08f4045bc11ab60ec366d7357d56e30 -> trunk/c7e41071a08f4045bc11ab60ec366d7357d56e30 2025-09-07T06:13:37.4135548Z * [new tag] trunk/c98ddaca6d2e19ca37aff00c4ff0cda1e9a6ff65 -> trunk/c98ddaca6d2e19ca37aff00c4ff0cda1e9a6ff65 2025-09-07T06:13:37.4136502Z * [new tag] trunk/cb1e31362c7b53acf4ac95b9f8878064c184f03b -> trunk/cb1e31362c7b53acf4ac95b9f8878064c184f03b 2025-09-07T06:13:37.4137538Z * [new tag] trunk/cbfb005f7cce79974795b148e265f594f59477c8 -> trunk/cbfb005f7cce79974795b148e265f594f59477c8 2025-09-07T06:13:37.4138584Z * [new tag] trunk/cc5bdd12401bda835291d2f3cb297132ebdbf358 -> trunk/cc5bdd12401bda835291d2f3cb297132ebdbf358 2025-09-07T06:13:37.4139794Z * [new tag] trunk/cd529b686d54bbaa443f5b310140de48422d96c7 -> trunk/cd529b686d54bbaa443f5b310140de48422d96c7 2025-09-07T06:13:37.4140788Z * [new tag] trunk/cec0ff122815582af5302360aff03676558c5c87 -> trunk/cec0ff122815582af5302360aff03676558c5c87 2025-09-07T06:13:37.4141766Z * [new tag] trunk/d11720efdb563d02cf4f7d324311fb15a755268e -> trunk/d11720efdb563d02cf4f7d324311fb15a755268e 2025-09-07T06:13:37.4142752Z * [new tag] trunk/d1706d9128ae24d9048167e80d3fe5196d19035e -> trunk/d1706d9128ae24d9048167e80d3fe5196d19035e 2025-09-07T06:13:37.4143810Z * [new tag] trunk/d1a15abfdcaef138f2d9e93a9f46be44f30b766d -> trunk/d1a15abfdcaef138f2d9e93a9f46be44f30b766d 2025-09-07T06:13:37.4145029Z * [new tag] trunk/d232a95d4a79404ca05c1f52d37fde7339dcdf49 -> trunk/d232a95d4a79404ca05c1f52d37fde7339dcdf49 2025-09-07T06:13:37.4146127Z * [new tag] trunk/d2d4c8e9b2371c9aacfb771d9402ac7427b9778e -> trunk/d2d4c8e9b2371c9aacfb771d9402ac7427b9778e 2025-09-07T06:13:37.4147134Z * [new tag] trunk/d33840c542b387ab08ba49aa6c45aa9567fd9be7 -> trunk/d33840c542b387ab08ba49aa6c45aa9567fd9be7 2025-09-07T06:13:37.4148093Z * [new tag] trunk/d5643e8f3a648a99636bfa1f2a41d54bd3c0d0f1 -> trunk/d5643e8f3a648a99636bfa1f2a41d54bd3c0d0f1 2025-09-07T06:13:37.4149024Z * [new tag] trunk/d5b38410b5b6cf75c7a7389972777a6497926ee7 -> trunk/d5b38410b5b6cf75c7a7389972777a6497926ee7 2025-09-07T06:13:37.4149832Z * [new tag] trunk/d5e0f4202ba14632e4d14862ace096609e763462 -> trunk/d5e0f4202ba14632e4d14862ace096609e763462 2025-09-07T06:13:37.4150992Z * [new tag] trunk/d636c181f9140a7b59be10b36eae23039fc2bb72 -> trunk/d636c181f9140a7b59be10b36eae23039fc2bb72 2025-09-07T06:13:37.4152645Z * [new tag] trunk/d64718503728001a1e78168fd7f2d4ff23e57285 -> trunk/d64718503728001a1e78168fd7f2d4ff23e57285 2025-09-07T06:13:37.4154086Z * [new tag] trunk/d67c29ad22670320d676b02e394274af34e8e643 -> trunk/d67c29ad22670320d676b02e394274af34e8e643 2025-09-07T06:13:37.4155058Z * [new tag] trunk/d6b74568e2c98ce58ecc145b72ac66d4caf7ce95 -> trunk/d6b74568e2c98ce58ecc145b72ac66d4caf7ce95 2025-09-07T06:13:37.4156049Z * [new tag] trunk/d711f27845abd45007ccab6076649ebd896c2661 -> trunk/d711f27845abd45007ccab6076649ebd896c2661 2025-09-07T06:13:37.4157019Z * [new tag] trunk/d9d6dde0f42d4bcc8c97671ac50d5096c7e500ab -> trunk/d9d6dde0f42d4bcc8c97671ac50d5096c7e500ab 2025-09-07T06:13:37.4158055Z * [new tag] trunk/da4db4b33d1fdd046650cf19fdbac581a19bf2f9 -> trunk/da4db4b33d1fdd046650cf19fdbac581a19bf2f9 2025-09-07T06:13:37.4158860Z * [new tag] trunk/dac8a4b91c01c3bbc96f54e621b1ea4ffdbd29d1 -> trunk/dac8a4b91c01c3bbc96f54e621b1ea4ffdbd29d1 2025-09-07T06:13:37.4159988Z * [new tag] trunk/dbec08729fb9848bebed6048c63831b87170d061 -> trunk/dbec08729fb9848bebed6048c63831b87170d061 2025-09-07T06:13:37.4160719Z * [new tag] trunk/dcf385395d838f38c8dca25913578230dd43099a -> trunk/dcf385395d838f38c8dca25913578230dd43099a 2025-09-07T06:13:37.4161669Z * [new tag] trunk/dd2519abe83ec3c40d4797492434e41fe3b47e17 -> trunk/dd2519abe83ec3c40d4797492434e41fe3b47e17 2025-09-07T06:13:37.4162687Z * [new tag] trunk/dec72ea4b006dd0fbcaaaa106ad273d73807ab9d -> trunk/dec72ea4b006dd0fbcaaaa106ad273d73807ab9d 2025-09-07T06:13:37.4163667Z * [new tag] trunk/e0a62b266c021b910ce6dc02a6c9429210487717 -> trunk/e0a62b266c021b910ce6dc02a6c9429210487717 2025-09-07T06:13:37.4164865Z * [new tag] trunk/e19e02c84c9dcc408375e5cae3b0709c18b99228 -> trunk/e19e02c84c9dcc408375e5cae3b0709c18b99228 2025-09-07T06:13:37.4165995Z * [new tag] trunk/e304ea4e69d3a7deeb7e48c7450c214a4c953937 -> trunk/e304ea4e69d3a7deeb7e48c7450c214a4c953937 2025-09-07T06:13:37.4166980Z * [new tag] trunk/e3068cdb446adefb5a875616ba37a60235391439 -> trunk/e3068cdb446adefb5a875616ba37a60235391439 2025-09-07T06:13:37.4168070Z * [new tag] trunk/e381d4b0205d5f126c1de534f867ba776f7c3ee6 -> trunk/e381d4b0205d5f126c1de534f867ba776f7c3ee6 2025-09-07T06:13:37.4169056Z * [new tag] trunk/e4bd0ff4f8981b805df32ea5b3550621965ea4f2 -> trunk/e4bd0ff4f8981b805df32ea5b3550621965ea4f2 2025-09-07T06:13:37.4169897Z * [new tag] trunk/e532c9d4f1cdcbc1ea9628f55b9813e77847bdc7 -> trunk/e532c9d4f1cdcbc1ea9628f55b9813e77847bdc7 2025-09-07T06:13:37.4170883Z * [new tag] trunk/e92cd9415377403b6e90585e764639e2e0b5973b -> trunk/e92cd9415377403b6e90585e764639e2e0b5973b 2025-09-07T06:13:37.4171819Z * [new tag] trunk/e9481b6617b5576b099d8ca5798111592e9ad090 -> trunk/e9481b6617b5576b099d8ca5798111592e9ad090 2025-09-07T06:13:37.4172725Z * [new tag] trunk/ea1883dfd3e42defe37b11202b878bb76defa087 -> trunk/ea1883dfd3e42defe37b11202b878bb76defa087 2025-09-07T06:13:37.4174080Z * [new tag] trunk/eac3d6f04cfbbebe3d470dacd216da7d4b1f95a8 -> trunk/eac3d6f04cfbbebe3d470dacd216da7d4b1f95a8 2025-09-07T06:13:37.4174994Z * [new tag] trunk/eb18d32bda75189494d955aa001ade15f10333de -> trunk/eb18d32bda75189494d955aa001ade15f10333de 2025-09-07T06:13:37.4176005Z * [new tag] trunk/ef3be6726f7ff4b77c22db10cec5b686f9107ea9 -> trunk/ef3be6726f7ff4b77c22db10cec5b686f9107ea9 2025-09-07T06:13:37.4177002Z * [new tag] trunk/ef8aabd42422725026cb4dbf48aafa9efa226a04 -> trunk/ef8aabd42422725026cb4dbf48aafa9efa226a04 2025-09-07T06:13:37.4178245Z * [new tag] trunk/f00445b43eee57e20bb9316fa796ca23bf73373b -> trunk/f00445b43eee57e20bb9316fa796ca23bf73373b 2025-09-07T06:13:37.4179196Z * [new tag] trunk/f0c391102b754e3b145e8c59231d2df563487e37 -> trunk/f0c391102b754e3b145e8c59231d2df563487e37 2025-09-07T06:13:37.4180320Z * [new tag] trunk/f27985b7e796fb66a1b476284ba42d8cb360a751 -> trunk/f27985b7e796fb66a1b476284ba42d8cb360a751 2025-09-07T06:13:37.4181481Z * [new tag] trunk/f36f285953700f971552083a5da9d0ceacb63bbd -> trunk/f36f285953700f971552083a5da9d0ceacb63bbd 2025-09-07T06:13:37.4182437Z * [new tag] trunk/f3cebec39ebc110e1c8b06e741896585f7892dbb -> trunk/f3cebec39ebc110e1c8b06e741896585f7892dbb 2025-09-07T06:13:37.4183267Z * [new tag] trunk/f4c33cd44acac92c0b451a04da20ebe9370e5b0c -> trunk/f4c33cd44acac92c0b451a04da20ebe9370e5b0c 2025-09-07T06:13:37.4184335Z * [new tag] trunk/f612045ce105f008b2b675e2fc870163babeb2e8 -> trunk/f612045ce105f008b2b675e2fc870163babeb2e8 2025-09-07T06:13:37.4185655Z * [new tag] trunk/f8746b878dfc1e9639d42cbde832e9b9e792c86c -> trunk/f8746b878dfc1e9639d42cbde832e9b9e792c86c 2025-09-07T06:13:37.4186595Z * [new tag] trunk/f8ffa9194e26523e5f976d4a824d5cc58922727c -> trunk/f8ffa9194e26523e5f976d4a824d5cc58922727c 2025-09-07T06:13:37.4187558Z * [new tag] trunk/f981a7fa5230b98974291fdde32fe8488bc5d469 -> trunk/f981a7fa5230b98974291fdde32fe8488bc5d469 2025-09-07T06:13:37.4188514Z * [new tag] trunk/fbf3d2027daabbcb44d0af274b139be2a248a4f7 -> trunk/fbf3d2027daabbcb44d0af274b139be2a248a4f7 2025-09-07T06:13:37.4189760Z * [new tag] trunk/fca2601c9d628e1bd2d75c7318cd22c4e8c832aa -> trunk/fca2601c9d628e1bd2d75c7318cd22c4e8c832aa 2025-09-07T06:13:37.4190727Z * [new tag] trunk/fea20775ad96bdca972a1811d7d3372f368614ab -> trunk/fea20775ad96bdca972a1811d7d3372f368614ab 2025-09-07T06:13:37.4191505Z * [new tag] trunk/fefee081642f87419a21dc852f7167d4640443cd -> trunk/fefee081642f87419a21dc852f7167d4640443cd 2025-09-07T06:13:37.4192465Z * [new tag] v0.1.1 -> v0.1.1 2025-09-07T06:13:37.4196397Z * [new tag] v0.1.10 -> v0.1.10 2025-09-07T06:13:37.4197424Z * [new tag] v0.1.11 -> v0.1.11 2025-09-07T06:13:37.4198310Z * [new tag] v0.1.12 -> v0.1.12 2025-09-07T06:13:37.4199367Z * [new tag] v0.1.2 -> v0.1.2 2025-09-07T06:13:37.4200118Z * [new tag] v0.1.3 -> v0.1.3 2025-09-07T06:13:37.4201067Z * [new tag] v0.1.4 -> v0.1.4 2025-09-07T06:13:37.4201861Z * [new tag] v0.1.5 -> v0.1.5 2025-09-07T06:13:37.4202827Z * [new tag] v0.1.6 -> v0.1.6 2025-09-07T06:13:37.4203705Z * [new tag] v0.1.7 -> v0.1.7 2025-09-07T06:13:37.4204692Z * [new tag] v0.1.8 -> v0.1.8 2025-09-07T06:13:37.4205457Z * [new tag] v0.1.9 -> v0.1.9 2025-09-07T06:13:37.4206407Z * [new tag] v0.2.0 -> v0.2.0 2025-09-07T06:13:37.4207325Z * [new tag] v0.3.0 -> v0.3.0 2025-09-07T06:13:37.4208352Z * [new tag] v0.3.1 -> v0.3.1 2025-09-07T06:13:37.4209230Z * [new tag] v0.4.0 -> v0.4.0 2025-09-07T06:13:37.4210072Z * [new tag] v0.4.1 -> v0.4.1 2025-09-07T06:13:37.4210879Z * [new tag] v1.0.0 -> v1.0.0 2025-09-07T06:13:37.4211915Z * [new tag] v1.0.0a0 -> v1.0.0a0 2025-09-07T06:13:37.4212880Z * [new tag] v1.0.1 -> v1.0.1 2025-09-07T06:13:37.4214085Z * [new tag] v1.0rc0 -> v1.0rc0 2025-09-07T06:13:37.4214734Z * [new tag] v1.0rc1 -> v1.0rc1 2025-09-07T06:13:37.4215706Z * [new tag] v1.1.0 -> v1.1.0 2025-09-07T06:13:37.4216652Z * [new tag] v1.1.0a0 -> v1.1.0a0 2025-09-07T06:13:37.4217879Z * [new tag] v1.10.0 -> v1.10.0 2025-09-07T06:13:37.4219003Z * [new tag] v1.10.0-rc1 -> v1.10.0-rc1 2025-09-07T06:13:37.4219975Z * [new tag] v1.10.0-rc2 -> v1.10.0-rc2 2025-09-07T06:13:37.4220635Z * [new tag] v1.10.0-rc3 -> v1.10.0-rc3 2025-09-07T06:13:37.4221669Z * [new tag] v1.10.1 -> v1.10.1 2025-09-07T06:13:37.4222500Z * [new tag] v1.10.1-rc1 -> v1.10.1-rc1 2025-09-07T06:13:37.4223146Z * [new tag] v1.10.2 -> v1.10.2 2025-09-07T06:13:37.4223868Z * [new tag] v1.10.2-rc1 -> v1.10.2-rc1 2025-09-07T06:13:37.4224903Z * [new tag] v1.11.0 -> v1.11.0 2025-09-07T06:13:37.4225996Z * [new tag] v1.11.0-rc1 -> v1.11.0-rc1 2025-09-07T06:13:37.4226972Z * [new tag] v1.11.0-rc2 -> v1.11.0-rc2 2025-09-07T06:13:37.4227955Z * [new tag] v1.11.0-rc3 -> v1.11.0-rc3 2025-09-07T06:13:37.4228859Z * [new tag] v1.11.0-rc4 -> v1.11.0-rc4 2025-09-07T06:13:37.4229775Z * [new tag] v1.11.0-rc5 -> v1.11.0-rc5 2025-09-07T06:13:37.4230422Z * [new tag] v1.11.0-rc6 -> v1.11.0-rc6 2025-09-07T06:13:37.4231111Z * [new tag] v1.11.0-rc7 -> v1.11.0-rc7 2025-09-07T06:13:37.4232117Z * [new tag] v1.12.0 -> v1.12.0 2025-09-07T06:13:37.4233042Z * [new tag] v1.12.0-rc1 -> v1.12.0-rc1 2025-09-07T06:13:37.4233944Z * [new tag] v1.12.0-rc2 -> v1.12.0-rc2 2025-09-07T06:13:37.4234856Z * [new tag] v1.12.0-rc3 -> v1.12.0-rc3 2025-09-07T06:13:37.4235790Z * [new tag] v1.12.0-rc4 -> v1.12.0-rc4 2025-09-07T06:13:37.4236710Z * [new tag] v1.12.0-rc5 -> v1.12.0-rc5 2025-09-07T06:13:37.4237756Z * [new tag] v1.12.0-rc6 -> v1.12.0-rc6 2025-09-07T06:13:37.4238306Z * [new tag] v1.12.0-rc7 -> v1.12.0-rc7 2025-09-07T06:13:37.4238988Z * [new tag] v1.12.0-rc8 -> v1.12.0-rc8 2025-09-07T06:13:37.4239681Z * [new tag] v1.12.1 -> v1.12.1 2025-09-07T06:13:37.4240755Z * [new tag] v1.12.1-rc1 -> v1.12.1-rc1 2025-09-07T06:13:37.4241652Z * [new tag] v1.12.1-rc2 -> v1.12.1-rc2 2025-09-07T06:13:37.4242625Z * [new tag] v1.12.1-rc3 -> v1.12.1-rc3 2025-09-07T06:13:37.4243734Z * [new tag] v1.12.1-rc4 -> v1.12.1-rc4 2025-09-07T06:13:37.4244387Z * [new tag] v1.12.1-rc5 -> v1.12.1-rc5 2025-09-07T06:13:37.4245346Z * [new tag] v1.13.0 -> v1.13.0 2025-09-07T06:13:37.4246235Z * [new tag] v1.13.0-rc1 -> v1.13.0-rc1 2025-09-07T06:13:37.4247172Z * [new tag] v1.13.0-rc2 -> v1.13.0-rc2 2025-09-07T06:13:37.4248059Z * [new tag] v1.13.0-rc3 -> v1.13.0-rc3 2025-09-07T06:13:37.4249115Z * [new tag] v1.13.0-rc4 -> v1.13.0-rc4 2025-09-07T06:13:37.4249758Z * [new tag] v1.13.0-rc5 -> v1.13.0-rc5 2025-09-07T06:13:37.4250448Z * [new tag] v1.13.0-rc6 -> v1.13.0-rc6 2025-09-07T06:13:37.4251399Z * [new tag] v1.13.1 -> v1.13.1 2025-09-07T06:13:37.4252564Z * [new tag] v1.13.1-rc1 -> v1.13.1-rc1 2025-09-07T06:13:37.4253767Z * [new tag] v1.2.0 -> v1.2.0 2025-09-07T06:13:37.4254760Z * [new tag] v1.2.0a0 -> v1.2.0a0 2025-09-07T06:13:37.4255705Z * [new tag] v1.3.0 -> v1.3.0 2025-09-07T06:13:37.4256679Z * [new tag] v1.3.0a0 -> v1.3.0a0 2025-09-07T06:13:37.4257382Z * [new tag] v1.3.1 -> v1.3.1 2025-09-07T06:13:37.4258328Z * [new tag] v1.4.0 -> v1.4.0 2025-09-07T06:13:37.4259222Z * [new tag] v1.4.0a0 -> v1.4.0a0 2025-09-07T06:13:37.4259880Z * [new tag] v1.4.1 -> v1.4.1 2025-09-07T06:13:37.4260961Z * [new tag] v1.5.0 -> v1.5.0 2025-09-07T06:13:37.4262031Z * [new tag] v1.5.0-rc1 -> v1.5.0-rc1 2025-09-07T06:13:37.4263009Z * [new tag] v1.5.0-rc2 -> v1.5.0-rc2 2025-09-07T06:13:37.4264052Z * [new tag] v1.5.0-rc3 -> v1.5.0-rc3 2025-09-07T06:13:37.4265059Z * [new tag] v1.5.0-rc4 -> v1.5.0-rc4 2025-09-07T06:13:37.4265880Z * [new tag] v1.5.0-rc5 -> v1.5.0-rc5 2025-09-07T06:13:37.4266875Z * [new tag] v1.5.1 -> v1.5.1 2025-09-07T06:13:37.4267587Z * [new tag] v1.5.1-rc1 -> v1.5.1-rc1 2025-09-07T06:13:37.4268259Z * [new tag] v1.6.0 -> v1.6.0 2025-09-07T06:13:37.4269185Z * [new tag] v1.6.0-rc1 -> v1.6.0-rc1 2025-09-07T06:13:37.4270160Z * [new tag] v1.6.0-rc2 -> v1.6.0-rc2 2025-09-07T06:13:37.4271125Z * [new tag] v1.6.0-rc3 -> v1.6.0-rc3 2025-09-07T06:13:37.4272109Z * [new tag] v1.6.0-rc4 -> v1.6.0-rc4 2025-09-07T06:13:37.4273035Z * [new tag] v1.6.0-rc5 -> v1.6.0-rc5 2025-09-07T06:13:37.4273949Z * [new tag] v1.6.0-rc6 -> v1.6.0-rc6 2025-09-07T06:13:37.4274685Z * [new tag] v1.6.0-rc7 -> v1.6.0-rc7 2025-09-07T06:13:37.4275580Z * [new tag] v1.7.0 -> v1.7.0 2025-09-07T06:13:37.4276587Z * [new tag] v1.7.0-rc1 -> v1.7.0-rc1 2025-09-07T06:13:37.4277575Z * [new tag] v1.7.0-rc2 -> v1.7.0-rc2 2025-09-07T06:13:37.4278491Z * [new tag] v1.7.0-rc3 -> v1.7.0-rc3 2025-09-07T06:13:37.4279124Z * [new tag] v1.7.0-rc4 -> v1.7.0-rc4 2025-09-07T06:13:37.4280101Z * [new tag] v1.7.1 -> v1.7.1 2025-09-07T06:13:37.4281132Z * [new tag] v1.7.1-rc1 -> v1.7.1-rc1 2025-09-07T06:13:37.4282111Z * [new tag] v1.7.1-rc2 -> v1.7.1-rc2 2025-09-07T06:13:37.4282779Z * [new tag] v1.7.1-rc3 -> v1.7.1-rc3 2025-09-07T06:13:37.4283749Z * [new tag] v1.8.0 -> v1.8.0 2025-09-07T06:13:37.4284404Z * [new tag] v1.8.0-rc1 -> v1.8.0-rc1 2025-09-07T06:13:37.4285377Z * [new tag] v1.8.0-rc2 -> v1.8.0-rc2 2025-09-07T06:13:37.4286358Z * [new tag] v1.8.0-rc3 -> v1.8.0-rc3 2025-09-07T06:13:37.4287107Z * [new tag] v1.8.0-rc4 -> v1.8.0-rc4 2025-09-07T06:13:37.4287780Z * [new tag] v1.8.0-rc5 -> v1.8.0-rc5 2025-09-07T06:13:37.4288507Z * [new tag] v1.8.1 -> v1.8.1 2025-09-07T06:13:37.4289465Z * [new tag] v1.8.1-rc1 -> v1.8.1-rc1 2025-09-07T06:13:37.4290109Z * [new tag] v1.8.1-rc2 -> v1.8.1-rc2 2025-09-07T06:13:37.4290804Z * [new tag] v1.8.1-rc3 -> v1.8.1-rc3 2025-09-07T06:13:37.4293786Z * [new tag] v1.8.2 -> v1.8.2 2025-09-07T06:13:37.4293959Z * [new tag] v1.8.2-rc1 -> v1.8.2-rc1 2025-09-07T06:13:37.4294777Z * [new tag] v1.9.0 -> v1.9.0 2025-09-07T06:13:37.4295853Z * [new tag] v1.9.0-rc1 -> v1.9.0-rc1 2025-09-07T06:13:37.4296860Z * [new tag] v1.9.0-rc2 -> v1.9.0-rc2 2025-09-07T06:13:37.4297860Z * [new tag] v1.9.0-rc3 -> v1.9.0-rc3 2025-09-07T06:13:37.4298557Z * [new tag] v1.9.0-rc4 -> v1.9.0-rc4 2025-09-07T06:13:37.4299570Z * [new tag] v1.9.1 -> v1.9.1 2025-09-07T06:13:37.4300749Z * [new tag] v1.9.1-rc1 -> v1.9.1-rc1 2025-09-07T06:13:37.4301417Z * [new tag] v1.9.1-rc2 -> v1.9.1-rc2 2025-09-07T06:13:37.4302466Z * [new tag] v2.0.0 -> v2.0.0 2025-09-07T06:13:37.4303416Z * [new tag] v2.0.0-rc1 -> v2.0.0-rc1 2025-09-07T06:13:37.4304387Z * [new tag] v2.0.0-rc2 -> v2.0.0-rc2 2025-09-07T06:13:37.4305499Z * [new tag] v2.0.0-rc3 -> v2.0.0-rc3 2025-09-07T06:13:37.4306543Z * [new tag] v2.0.0-rc4 -> v2.0.0-rc4 2025-09-07T06:13:37.4307541Z * [new tag] v2.0.0-rc5 -> v2.0.0-rc5 2025-09-07T06:13:37.4308210Z * [new tag] v2.0.0-rc6 -> v2.0.0-rc6 2025-09-07T06:13:37.4309238Z * [new tag] v2.0.1 -> v2.0.1 2025-09-07T06:13:37.4310209Z * [new tag] v2.0.1-rc1 -> v2.0.1-rc1 2025-09-07T06:13:37.4310884Z * [new tag] v2.0.1-rc2 -> v2.0.1-rc2 2025-09-07T06:13:37.4312179Z * [new tag] v2.0.1-rc3 -> v2.0.1-rc3 2025-09-07T06:13:37.4312968Z * [new tag] v2.0.1-rc4 -> v2.0.1-rc4 2025-09-07T06:13:37.4314395Z * [new tag] v2.1.0 -> v2.1.0 2025-09-07T06:13:37.4315298Z * [new tag] v2.1.0-rc1 -> v2.1.0-rc1 2025-09-07T06:13:37.4316278Z * [new tag] v2.1.0-rc2 -> v2.1.0-rc2 2025-09-07T06:13:37.4317289Z * [new tag] v2.1.0-rc3 -> v2.1.0-rc3 2025-09-07T06:13:37.4318273Z * [new tag] v2.1.0-rc4 -> v2.1.0-rc4 2025-09-07T06:13:37.4319262Z * [new tag] v2.1.0-rc5 -> v2.1.0-rc5 2025-09-07T06:13:37.4319926Z * [new tag] v2.1.0-rc6 -> v2.1.0-rc6 2025-09-07T06:13:37.4320918Z * [new tag] v2.1.1 -> v2.1.1 2025-09-07T06:13:37.4321885Z * [new tag] v2.1.1-rc1 -> v2.1.1-rc1 2025-09-07T06:13:37.4322776Z * [new tag] v2.1.1-rc2 -> v2.1.1-rc2 2025-09-07T06:13:37.4323906Z * [new tag] v2.1.1-rc3 -> v2.1.1-rc3 2025-09-07T06:13:37.4324883Z * [new tag] v2.1.1-rc4 -> v2.1.1-rc4 2025-09-07T06:13:37.4325723Z * [new tag] v2.1.1-rc5 -> v2.1.1-rc5 2025-09-07T06:13:37.4326428Z * [new tag] v2.1.1-rc6 -> v2.1.1-rc6 2025-09-07T06:13:37.4327388Z * [new tag] v2.1.2 -> v2.1.2 2025-09-07T06:13:37.4328415Z * [new tag] v2.1.2-rc1 -> v2.1.2-rc1 2025-09-07T06:13:37.4329402Z * [new tag] v2.1.2-rc2 -> v2.1.2-rc2 2025-09-07T06:13:37.4330088Z * [new tag] v2.1.2-rc3 -> v2.1.2-rc3 2025-09-07T06:13:37.4331091Z * [new tag] v2.2.0 -> v2.2.0 2025-09-07T06:13:37.4332063Z * [new tag] v2.2.0-rc1 -> v2.2.0-rc1 2025-09-07T06:13:37.4333200Z * [new tag] v2.2.0-rc2 -> v2.2.0-rc2 2025-09-07T06:13:37.4334212Z * [new tag] v2.2.0-rc3 -> v2.2.0-rc3 2025-09-07T06:13:37.4335108Z * [new tag] v2.2.0-rc4 -> v2.2.0-rc4 2025-09-07T06:13:37.4336059Z * [new tag] v2.2.0-rc5 -> v2.2.0-rc5 2025-09-07T06:13:37.4337139Z * [new tag] v2.2.0-rc6 -> v2.2.0-rc6 2025-09-07T06:13:37.4337815Z * [new tag] v2.2.0-rc7 -> v2.2.0-rc7 2025-09-07T06:13:37.4338688Z * [new tag] v2.2.0-rc8 -> v2.2.0-rc8 2025-09-07T06:13:37.4339693Z * [new tag] v2.2.1 -> v2.2.1 2025-09-07T06:13:37.4340719Z * [new tag] v2.2.1-rc1 -> v2.2.1-rc1 2025-09-07T06:13:37.4341410Z * [new tag] v2.2.1-rc2 -> v2.2.1-rc2 2025-09-07T06:13:37.4342136Z * [new tag] v2.2.1-rc3 -> v2.2.1-rc3 2025-09-07T06:13:37.4342865Z * [new tag] v2.2.2 -> v2.2.2 2025-09-07T06:13:37.4344010Z * [new tag] v2.2.2-rc1 -> v2.2.2-rc1 2025-09-07T06:13:37.4344693Z * [new tag] v2.2.2-rc2 -> v2.2.2-rc2 2025-09-07T06:13:37.4345523Z * [new tag] v2.2.2-rc3 -> v2.2.2-rc3 2025-09-07T06:13:37.4346503Z * [new tag] v2.3.0 -> v2.3.0 2025-09-07T06:13:37.4347425Z * [new tag] v2.3.0-rc1 -> v2.3.0-rc1 2025-09-07T06:13:37.4348426Z * [new tag] v2.3.0-rc10 -> v2.3.0-rc10 2025-09-07T06:13:37.4349421Z * [new tag] v2.3.0-rc11 -> v2.3.0-rc11 2025-09-07T06:13:37.4350080Z * [new tag] v2.3.0-rc12 -> v2.3.0-rc12 2025-09-07T06:13:37.4351109Z * [new tag] v2.3.0-rc2 -> v2.3.0-rc2 2025-09-07T06:13:37.4352100Z * [new tag] v2.3.0-rc3 -> v2.3.0-rc3 2025-09-07T06:13:37.4352882Z * [new tag] v2.3.0-rc4 -> v2.3.0-rc4 2025-09-07T06:13:37.4353857Z * [new tag] v2.3.0-rc5 -> v2.3.0-rc5 2025-09-07T06:13:37.4354496Z * [new tag] v2.3.0-rc6 -> v2.3.0-rc6 2025-09-07T06:13:37.4355525Z * [new tag] v2.3.0-rc7 -> v2.3.0-rc7 2025-09-07T06:13:37.4356461Z * [new tag] v2.3.0-rc8 -> v2.3.0-rc8 2025-09-07T06:13:37.4357098Z * [new tag] v2.3.0-rc9 -> v2.3.0-rc9 2025-09-07T06:13:37.4357808Z * [new tag] v2.3.1 -> v2.3.1 2025-09-07T06:13:37.4358801Z * [new tag] v2.3.1-rc1 -> v2.3.1-rc1 2025-09-07T06:13:37.4359774Z * [new tag] v2.3.1-rc2 -> v2.3.1-rc2 2025-09-07T06:13:37.4360706Z * [new tag] v2.3.1-rc3 -> v2.3.1-rc3 2025-09-07T06:13:37.4361690Z * [new tag] v2.4.0 -> v2.4.0 2025-09-07T06:13:37.4362646Z * [new tag] v2.4.0-rc1 -> v2.4.0-rc1 2025-09-07T06:13:37.4363495Z * [new tag] v2.4.0-rc2 -> v2.4.0-rc2 2025-09-07T06:13:37.4364476Z * [new tag] v2.4.0-rc3 -> v2.4.0-rc3 2025-09-07T06:13:37.4365643Z * [new tag] v2.4.0-rc4 -> v2.4.0-rc4 2025-09-07T06:13:37.4366663Z * [new tag] v2.4.0-rc5 -> v2.4.0-rc5 2025-09-07T06:13:37.4367630Z * [new tag] v2.4.0-rc6 -> v2.4.0-rc6 2025-09-07T06:13:37.4368627Z * [new tag] v2.4.0-rc7 -> v2.4.0-rc7 2025-09-07T06:13:37.4369566Z * [new tag] v2.4.0-rc8 -> v2.4.0-rc8 2025-09-07T06:13:37.4370560Z * [new tag] v2.4.0-rc9 -> v2.4.0-rc9 2025-09-07T06:13:37.4371696Z * [new tag] v2.4.1 -> v2.4.1 2025-09-07T06:13:37.4372835Z * [new tag] v2.4.1-rc1 -> v2.4.1-rc1 2025-09-07T06:13:37.4374141Z * [new tag] v2.4.1-rc2 -> v2.4.1-rc2 2025-09-07T06:13:37.4375200Z * [new tag] v2.4.1-rc3 -> v2.4.1-rc3 2025-09-07T06:13:37.4376156Z * [new tag] v2.5.0 -> v2.5.0 2025-09-07T06:13:37.4377204Z * [new tag] v2.5.0-rc1 -> v2.5.0-rc1 2025-09-07T06:13:37.4377899Z * [new tag] v2.5.0-rc10 -> v2.5.0-rc10 2025-09-07T06:13:37.4378915Z * [new tag] v2.5.0-rc2 -> v2.5.0-rc2 2025-09-07T06:13:37.4379839Z * [new tag] v2.5.0-rc3 -> v2.5.0-rc3 2025-09-07T06:13:37.4380839Z * [new tag] v2.5.0-rc4 -> v2.5.0-rc4 2025-09-07T06:13:37.4381828Z * [new tag] v2.5.0-rc5 -> v2.5.0-rc5 2025-09-07T06:13:37.4382927Z * [new tag] v2.5.0-rc6 -> v2.5.0-rc6 2025-09-07T06:13:37.4383951Z * [new tag] v2.5.0-rc7 -> v2.5.0-rc7 2025-09-07T06:13:37.4385095Z * [new tag] v2.5.0-rc8 -> v2.5.0-rc8 2025-09-07T06:13:37.4386068Z * [new tag] v2.5.0-rc9 -> v2.5.0-rc9 2025-09-07T06:13:37.4386727Z * [new tag] v2.5.1 -> v2.5.1 2025-09-07T06:13:37.4387466Z * [new tag] v2.5.1-rc1 -> v2.5.1-rc1 2025-09-07T06:13:37.4388166Z * [new tag] v2.6.0 -> v2.6.0 2025-09-07T06:13:37.4389232Z * [new tag] v2.6.0-rc1 -> v2.6.0-rc1 2025-09-07T06:13:37.4390274Z * [new tag] v2.6.0-rc2 -> v2.6.0-rc2 2025-09-07T06:13:37.4391319Z * [new tag] v2.6.0-rc3 -> v2.6.0-rc3 2025-09-07T06:13:37.4392233Z * [new tag] v2.6.0-rc4 -> v2.6.0-rc4 2025-09-07T06:13:37.4396233Z * [new tag] v2.6.0-rc5 -> v2.6.0-rc5 2025-09-07T06:13:37.4397355Z * [new tag] v2.6.0-rc6 -> v2.6.0-rc6 2025-09-07T06:13:37.4398408Z * [new tag] v2.6.0-rc7 -> v2.6.0-rc7 2025-09-07T06:13:37.4399472Z * [new tag] v2.6.0-rc8 -> v2.6.0-rc8 2025-09-07T06:13:37.4400620Z * [new tag] v2.6.0-rc9 -> v2.6.0-rc9 2025-09-07T06:13:37.4401821Z * [new tag] v2.7.0 -> v2.7.0 2025-09-07T06:13:37.4402786Z * [new tag] v2.7.0-rc1 -> v2.7.0-rc1 2025-09-07T06:13:37.4403520Z * [new tag] v2.7.0-rc10 -> v2.7.0-rc10 2025-09-07T06:13:37.4404810Z * [new tag] v2.7.0-rc2 -> v2.7.0-rc2 2025-09-07T06:13:37.4405796Z * [new tag] v2.7.0-rc3 -> v2.7.0-rc3 2025-09-07T06:13:37.4406779Z * [new tag] v2.7.0-rc4 -> v2.7.0-rc4 2025-09-07T06:13:37.4407665Z * [new tag] v2.7.0-rc5 -> v2.7.0-rc5 2025-09-07T06:13:37.4408620Z * [new tag] v2.7.0-rc6 -> v2.7.0-rc6 2025-09-07T06:13:37.4409570Z * [new tag] v2.7.0-rc7 -> v2.7.0-rc7 2025-09-07T06:13:37.4410619Z * [new tag] v2.7.0-rc8 -> v2.7.0-rc8 2025-09-07T06:13:37.4411640Z * [new tag] v2.7.0-rc9 -> v2.7.0-rc9 2025-09-07T06:13:37.4412311Z * [new tag] v2.7.1 -> v2.7.1 2025-09-07T06:13:37.4413766Z * [new tag] v2.7.1-rc1 -> v2.7.1-rc1 2025-09-07T06:13:37.4414874Z * [new tag] v2.7.1-rc2 -> v2.7.1-rc2 2025-09-07T06:13:37.4416067Z * [new tag] v2.7.1-rc3 -> v2.7.1-rc3 2025-09-07T06:13:37.4417148Z * [new tag] v2.7.1-rc4 -> v2.7.1-rc4 2025-09-07T06:13:37.4418087Z * [new tag] v2.7.1-rc5 -> v2.7.1-rc5 2025-09-07T06:13:37.4418813Z * [new tag] v2.8.0 -> v2.8.0 2025-09-07T06:13:37.4419873Z * [new tag] v2.8.0-rc1 -> v2.8.0-rc1 2025-09-07T06:13:37.4420913Z * [new tag] v2.8.0-rc2 -> v2.8.0-rc2 2025-09-07T06:13:37.4421992Z * [new tag] v2.8.0-rc3 -> v2.8.0-rc3 2025-09-07T06:13:37.4423036Z * [new tag] v2.8.0-rc4 -> v2.8.0-rc4 2025-09-07T06:13:37.4424053Z * [new tag] v2.8.0-rc5 -> v2.8.0-rc5 2025-09-07T06:13:37.4425197Z * [new tag] v2.8.0-rc6 -> v2.8.0-rc6 2025-09-07T06:13:37.4426209Z * [new tag] v2.8.0-rc7 -> v2.8.0-rc7 2025-09-07T06:13:37.4427171Z * [new tag] v2.8.0-rc8 -> v2.8.0-rc8 2025-09-07T06:13:37.4428168Z * [new tag] whc_flight_1 -> whc_flight_1 2025-09-07T06:13:37.4429147Z * [new tag] whc_flight_2 -> whc_flight_2 2025-09-07T06:13:37.4429936Z * [new tag] whc_flight_4 -> whc_flight_4 2025-09-07T06:13:37.5151932Z [command]/usr/bin/git rev-parse --verify --quiet 93fb23d6fae7c4e82c4239a1033e522088742634^{object} 2025-09-07T06:13:37.5182475Z 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:13:37.5184510Z ##[endgroup] 2025-09-07T06:13:37.5184808Z ##[group]Determining the checkout info 2025-09-07T06:13:37.5185908Z ##[endgroup] 2025-09-07T06:13:37.5190180Z [command]/usr/bin/git sparse-checkout disable 2025-09-07T06:13:37.5236191Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-09-07T06:13:37.5285563Z ##[group]Checking out the ref 2025-09-07T06:13:37.5288036Z [command]/usr/bin/git checkout --progress --force 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:13:38.5531748Z Updating files: 80% (15712/19405) 2025-09-07T06:13:38.5790870Z Updating files: 81% (15719/19405) 2025-09-07T06:13:38.6023777Z Updating files: 82% (15913/19405) 2025-09-07T06:13:38.6148530Z Updating files: 83% (16107/19405) 2025-09-07T06:13:38.6291406Z Updating files: 84% (16301/19405) 2025-09-07T06:13:38.6454394Z Updating files: 85% (16495/19405) 2025-09-07T06:13:38.6595294Z Updating files: 86% (16689/19405) 2025-09-07T06:13:38.6738655Z Updating files: 87% (16883/19405) 2025-09-07T06:13:38.6848898Z Updating files: 88% (17077/19405) 2025-09-07T06:13:38.6993701Z Updating files: 89% (17271/19405) 2025-09-07T06:13:38.7171797Z Updating files: 90% (17465/19405) 2025-09-07T06:13:38.7289728Z Updating files: 91% (17659/19405) 2025-09-07T06:13:38.7440743Z Updating files: 92% (17853/19405) 2025-09-07T06:13:38.7635211Z Updating files: 93% (18047/19405) 2025-09-07T06:13:38.7851614Z Updating files: 94% (18241/19405) 2025-09-07T06:13:38.8012428Z Updating files: 95% (18435/19405) 2025-09-07T06:13:38.8175243Z Updating files: 96% (18629/19405) 2025-09-07T06:13:38.8365718Z Updating files: 97% (18823/19405) 2025-09-07T06:13:38.8643262Z Updating files: 98% (19017/19405) 2025-09-07T06:13:38.8808124Z Updating files: 99% (19211/19405) 2025-09-07T06:13:38.8808607Z Updating files: 100% (19405/19405) 2025-09-07T06:13:38.8808965Z Updating files: 100% (19405/19405), done. 2025-09-07T06:13:38.9086024Z Note: switching to '93fb23d6fae7c4e82c4239a1033e522088742634'. 2025-09-07T06:13:38.9086697Z 2025-09-07T06:13:38.9087029Z You are in 'detached HEAD' state. You can look around, make experimental 2025-09-07T06:13:38.9087654Z changes and commit them, and you can discard any commits you make in this 2025-09-07T06:13:38.9088257Z state without impacting any branches by switching back to a branch. 2025-09-07T06:13:38.9088670Z 2025-09-07T06:13:38.9088907Z If you want to create a new branch to retain commits you create, you may 2025-09-07T06:13:38.9089470Z do so (now or later) by using -c with the switch command. Example: 2025-09-07T06:13:38.9089810Z 2025-09-07T06:13:38.9089934Z git switch -c 2025-09-07T06:13:38.9090164Z 2025-09-07T06:13:38.9090285Z Or undo this operation with: 2025-09-07T06:13:38.9090483Z 2025-09-07T06:13:38.9090583Z git switch - 2025-09-07T06:13:38.9090741Z 2025-09-07T06:13:38.9091001Z Turn off this advice by setting config variable advice.detachedHead to false 2025-09-07T06:13:38.9091389Z 2025-09-07T06:13:38.9091601Z HEAD is now at 93fb23d6fae Build vLLM nightly wheels (#162000) 2025-09-07T06:13:38.9174359Z ##[endgroup] 2025-09-07T06:13:38.9214710Z [command]/usr/bin/git log -1 --format=%H 2025-09-07T06:13:38.9239306Z 93fb23d6fae7c4e82c4239a1033e522088742634 2025-09-07T06:13:38.9345680Z ##[group]Run cd "${GITHUB_WORKSPACE}" 2025-09-07T06:13:38.9346086Z cd "${GITHUB_WORKSPACE}" 2025-09-07T06:13:38.9346412Z # Clean stale submodule dirs 2025-09-07T06:13:38.9346756Z if [ -z "${NO_SUDO}" ]; then 2025-09-07T06:13:38.9347175Z  sudo git submodule foreach --recursive git clean -ffdx 2025-09-07T06:13:38.9347694Z else 2025-09-07T06:13:38.9347997Z  git submodule foreach --recursive git clean -ffdx 2025-09-07T06:13:38.9348355Z fi 2025-09-07T06:13:38.9356945Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:13:38.9357322Z env: 2025-09-07T06:13:38.9357532Z PY_VERS: 3.12 2025-09-07T06:13:38.9357832Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:13:38.9358228Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:38.9358514Z BUILD_DEVICE: cu128 2025-09-07T06:13:38.9358761Z NO_SUDO: 2025-09-07T06:13:38.9358960Z ##[endgroup] 2025-09-07T06:13:39.4013816Z Prepare all required actions 2025-09-07T06:13:39.4014407Z Getting action download info 2025-09-07T06:13:39.5190127Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e) 2025-09-07T06:13:39.7140957Z ##[group]Run ./.github/actions/setup-linux 2025-09-07T06:13:39.7141340Z env: 2025-09-07T06:13:39.7141561Z PY_VERS: 3.12 2025-09-07T06:13:39.7141909Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:13:39.7142351Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:39.7142667Z BUILD_DEVICE: cu128 2025-09-07T06:13:39.7142941Z ##[endgroup] 2025-09-07T06:13:39.7196907Z ##[group]Run set -euo pipefail 2025-09-07T06:13:39.7197296Z set -euo pipefail 2025-09-07T06:13:39.7197610Z function get_ec2_metadata() { 2025-09-07T06:13:39.7198038Z  # Pulled from instance metadata endpoint for EC2 2025-09-07T06:13:39.7198800Z  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html 2025-09-07T06:13:39.7199430Z  category=$1 2025-09-07T06:13:39.7199826Z  # If it is GCP runner (runner name contains gcp), do not run this 2025-09-07T06:13:39.7200323Z  runner_name_str=i-09e93d2ae04f2bbfa 2025-09-07T06:13:39.7200689Z  if [[ -f /.inarc ]]; then 2025-09-07T06:13:39.7201239Z  echo "ARC Runner, no info on ec2 metadata" 2025-09-07T06:13:39.7201664Z  elif [[ $runner_name_str == *"gcp"* ]]; then 2025-09-07T06:13:39.7202201Z  echo "Runner is from Google Cloud Platform, No info on ec2 metadata" 2025-09-07T06:13:39.7202674Z  else 2025-09-07T06:13:39.7203664Z  curl -H "X-aws-ec2-metadata-token: $(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 30")" -fsSL "http://169.254.169.254/latest/meta-data/${category}" 2025-09-07T06:13:39.7204844Z  fi 2025-09-07T06:13:39.7205049Z } 2025-09-07T06:13:39.7205323Z echo "ami-id: $(get_ec2_metadata ami-id)" 2025-09-07T06:13:39.7205740Z echo "instance-id: $(get_ec2_metadata instance-id)" 2025-09-07T06:13:39.7206227Z echo "instance-type: $(get_ec2_metadata instance-type)" 2025-09-07T06:13:39.7206635Z echo "system info $(uname -a)" 2025-09-07T06:13:39.7212957Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:13:39.7213533Z env: 2025-09-07T06:13:39.7213759Z PY_VERS: 3.12 2025-09-07T06:13:39.7214126Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:13:39.7214547Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:39.7214870Z BUILD_DEVICE: cu128 2025-09-07T06:13:39.7215125Z ##[endgroup] 2025-09-07T06:13:39.7403929Z ami-id: ami-05ffe3c48a9991133 2025-09-07T06:13:39.7531329Z instance-id: i-09e93d2ae04f2bbfa 2025-09-07T06:13:39.7651518Z instance-type: r5.12xlarge 2025-09-07T06:13:39.7680686Z system info Linux ip-10-0-64-174.ec2.internal 6.1.141-155.222.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Jun 17 10:29:47 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux 2025-09-07T06:13:39.7711841Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T06:13:39.7712838Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-09-07T06:13:39.7719298Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:13:39.7719709Z env: 2025-09-07T06:13:39.7719922Z PY_VERS: 3.12 2025-09-07T06:13:39.7720376Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:13:39.7720759Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:39.7721055Z BUILD_DEVICE: cu128 2025-09-07T06:13:39.7721304Z ##[endgroup] 2025-09-07T06:13:39.7788471Z ##[group]Run if systemctl is-active --quiet docker; then 2025-09-07T06:13:39.7788935Z if systemctl is-active --quiet docker; then 2025-09-07T06:13:39.7789465Z  echo "Docker daemon is running..."; 2025-09-07T06:13:39.7789802Z else 2025-09-07T06:13:39.7790148Z  echo "Starting docker daemon..." && sudo systemctl start docker; 2025-09-07T06:13:39.7790578Z fi 2025-09-07T06:13:39.7796996Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:13:39.7797436Z env: 2025-09-07T06:13:39.7797665Z PY_VERS: 3.12 2025-09-07T06:13:39.7797996Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:13:39.7798434Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:39.7798746Z BUILD_DEVICE: cu128 2025-09-07T06:13:39.7799020Z ##[endgroup] 2025-09-07T06:13:39.7914841Z Docker daemon is running... 2025-09-07T06:13:39.7965624Z ##[group]Run nick-fields/retry@v3.0.0 2025-09-07T06:13:39.7965935Z with: 2025-09-07T06:13:39.7966129Z shell: bash 2025-09-07T06:13:39.7966354Z timeout_minutes: 5 2025-09-07T06:13:39.7966583Z max_attempts: 3 2025-09-07T06:13:39.7966827Z retry_wait_seconds: 30 2025-09-07T06:13:39.7969151Z command: AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" # For LF Runners we need to make sure we also login to Meta's ECR docker registry too. META_AWS_ACCOUNT_ID=308535385114 if [ "$AWS_ACCOUNT_ID" != "$META_AWS_ACCOUNT_ID" ] ; then aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$META_AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" fi 2025-09-07T06:13:39.7971664Z polling_interval_seconds: 1 2025-09-07T06:13:39.7971938Z warning_on_retry: true 2025-09-07T06:13:39.7972208Z continue_on_error: false 2025-09-07T06:13:39.7972453Z env: 2025-09-07T06:13:39.7972759Z PY_VERS: 3.12 2025-09-07T06:13:39.7973226Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:13:39.7973679Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:39.7974038Z BUILD_DEVICE: cu128 2025-09-07T06:13:39.7974305Z AWS_RETRY_MODE: standard 2025-09-07T06:13:39.7974601Z AWS_MAX_ATTEMPTS: 5 2025-09-07T06:13:39.7974876Z AWS_DEFAULT_REGION: us-east-1 2025-09-07T06:13:39.7975184Z ##[endgroup] 2025-09-07T06:13:41.0151361Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-09-07T06:13:41.0152093Z Configure a credential helper to remove this warning. See 2025-09-07T06:13:41.0152720Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-09-07T06:13:41.0153163Z 2025-09-07T06:13:41.0153278Z Login Succeeded 2025-09-07T06:13:41.8894705Z Command completed after 1 attempt(s). 2025-09-07T06:13:41.9003802Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-09-07T06:13:41.9004434Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-09-07T06:13:41.9005080Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-09-07T06:13:41.9013270Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:13:41.9013706Z env: 2025-09-07T06:13:41.9013967Z PY_VERS: 3.12 2025-09-07T06:13:41.9014326Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:13:41.9014785Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:41.9015138Z BUILD_DEVICE: cu128 2025-09-07T06:13:41.9015434Z ##[endgroup] 2025-09-07T06:13:41.9122779Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-09-07T06:13:41.9123410Z # ignore expansion of "docker ps -q" since it could be empty 2025-09-07T06:13:41.9123852Z # shellcheck disable=SC2046 2025-09-07T06:13:41.9124218Z docker stop $(docker ps -q) || true 2025-09-07T06:13:41.9124592Z # Prune all of the docker images 2025-09-07T06:13:41.9124928Z docker system prune -af 2025-09-07T06:13:41.9130827Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:13:41.9131233Z env: 2025-09-07T06:13:41.9131648Z PY_VERS: 3.12 2025-09-07T06:13:41.9131964Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:13:41.9132376Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:41.9132806Z BUILD_DEVICE: cu128 2025-09-07T06:13:41.9133258Z ##[endgroup] 2025-09-07T06:13:41.9626888Z "docker stop" requires at least 1 argument. 2025-09-07T06:13:41.9627342Z See 'docker stop --help'. 2025-09-07T06:13:41.9627548Z 2025-09-07T06:13:41.9627725Z Usage: docker stop [OPTIONS] CONTAINER [CONTAINER...] 2025-09-07T06:13:41.9628024Z 2025-09-07T06:13:41.9628159Z Stop one or more running containers 2025-09-07T06:13:41.9871849Z Total reclaimed space: 0B 2025-09-07T06:13:41.9915954Z ##[group]Run set +e 2025-09-07T06:13:41.9916257Z set +e 2025-09-07T06:13:41.9916509Z set -x 2025-09-07T06:13:41.9916735Z  2025-09-07T06:13:41.9917001Z PT_DOMAIN=download.pytorch.org 2025-09-07T06:13:41.9917648Z # TODO: Flaky access to download.pytorch.org https://github.com/pytorch/pytorch/issues/100400, 2025-09-07T06:13:41.9918498Z # cleaning this up once the issue is fixed. There are more than one resolved IP here, the last 2025-09-07T06:13:41.9919089Z # one is returned at random 2025-09-07T06:13:41.9919512Z RESOLVED_IP=$(dig -4 +short "${PT_DOMAIN}" | tail -n1) 2025-09-07T06:13:41.9920082Z  2025-09-07T06:13:41.9920326Z if [ -z "${RESOLVED_IP}" ]; then 2025-09-07T06:13:41.9920808Z  echo "Couldn't resolve ${PT_DOMAIN}, retrying with Google DNS..." 2025-09-07T06:13:41.9921379Z  RESOLVED_IP=$(dig -4 +short "${PT_DOMAIN}" @8.8.8.8 | tail -n1) 2025-09-07T06:13:41.9921820Z  2025-09-07T06:13:41.9922085Z  if [ -z "${RESOLVED_IP}" ]; then 2025-09-07T06:13:41.9922500Z  echo "Couldn't resolve ${PT_DOMAIN}, exiting..." 2025-09-07T06:13:41.9922901Z  exit 1 2025-09-07T06:13:41.9923248Z  fi 2025-09-07T06:13:41.9923597Z fi 2025-09-07T06:13:41.9923798Z  2025-09-07T06:13:41.9924056Z if grep -r "${PT_DOMAIN}" /etc/hosts; then 2025-09-07T06:13:41.9924417Z  # Clean up any old records first 2025-09-07T06:13:41.9924786Z  sudo sed -i "/${PT_DOMAIN}/d" /etc/hosts 2025-09-07T06:13:41.9925120Z fi 2025-09-07T06:13:41.9925315Z  2025-09-07T06:13:41.9925635Z echo "${RESOLVED_IP} ${PT_DOMAIN}" | sudo tee -a /etc/hosts 2025-09-07T06:13:41.9926026Z cat /etc/hosts 2025-09-07T06:13:41.9931725Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:13:41.9932097Z env: 2025-09-07T06:13:41.9932456Z PY_VERS: 3.12 2025-09-07T06:13:41.9933043Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:13:41.9933677Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:41.9934072Z BUILD_DEVICE: cu128 2025-09-07T06:13:41.9934342Z ##[endgroup] 2025-09-07T06:13:41.9960507Z + PT_DOMAIN=download.pytorch.org 2025-09-07T06:13:41.9967360Z ++ dig -4 +short download.pytorch.org 2025-09-07T06:13:41.9968273Z ++ tail -n1 2025-09-07T06:13:42.0649307Z + RESOLVED_IP=18.160.10.36 2025-09-07T06:13:42.0650032Z + '[' -z 18.160.10.36 ']' 2025-09-07T06:13:42.0650493Z + grep -r download.pytorch.org /etc/hosts 2025-09-07T06:13:42.0669246Z + echo '18.160.10.36 download.pytorch.org' 2025-09-07T06:13:42.0669932Z + sudo tee -a /etc/hosts 2025-09-07T06:13:42.1945323Z 18.160.10.36 download.pytorch.org 2025-09-07T06:13:42.1961845Z + cat /etc/hosts 2025-09-07T06:13:42.1971416Z 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 2025-09-07T06:13:42.1976513Z ::1 localhost6 localhost6.localdomain6 2025-09-07T06:13:42.1976944Z 18.160.10.36 download.pytorch.org 2025-09-07T06:13:42.2020681Z ##[group]Run set -eux 2025-09-07T06:13:42.2021080Z set -eux 2025-09-07T06:13:42.2021343Z  2025-09-07T06:13:42.2021758Z # Keep PyTorch nightly wheel here so that we can install it later during 2025-09-07T06:13:42.2022301Z # vLLM build process 2025-09-07T06:13:42.2022660Z mkdir -p "${RUNNER_TEMP}/artifacts/" 2025-09-07T06:13:42.2023016Z  2025-09-07T06:13:42.2023283Z container_name=$(docker run \ 2025-09-07T06:13:42.2023619Z  --tty \ 2025-09-07T06:13:42.2023901Z  --detach \ 2025-09-07T06:13:42.2024311Z  -e PLATFORM \ 2025-09-07T06:13:42.2024875Z  -v "${GITHUB_WORKSPACE}:/pytorch" \ 2025-09-07T06:13:42.2025731Z  -v "${RUNNER_TEMP}/artifacts:/artifacts" \ 2025-09-07T06:13:42.2026450Z  -w /artifacts/ \ 2025-09-07T06:13:42.2027011Z  "${MANYLINUX_IMAGE}" 2025-09-07T06:13:42.2027484Z ) 2025-09-07T06:13:42.2027777Z  2025-09-07T06:13:42.2028221Z # Determine python executable for given version (copied from build-triton-wheel) 2025-09-07T06:13:42.2028788Z case $PY_VERS in 2025-09-07T06:13:42.2029068Z 3.10) 2025-09-07T06:13:42.2029421Z  PYTHON_EXECUTABLE=/opt/python/cp310-cp310/bin/python 2025-09-07T06:13:42.2029850Z  ;; 2025-09-07T06:13:42.2030078Z 3.11) 2025-09-07T06:13:42.2030423Z  PYTHON_EXECUTABLE=/opt/python/cp311-cp311/bin/python 2025-09-07T06:13:42.2030832Z  ;; 2025-09-07T06:13:42.2031073Z 3.12) 2025-09-07T06:13:42.2031405Z  PYTHON_EXECUTABLE=/opt/python/cp312-cp312/bin/python 2025-09-07T06:13:42.2032112Z  ;; 2025-09-07T06:13:42.2032336Z 3.13) 2025-09-07T06:13:42.2032682Z  PYTHON_EXECUTABLE=/opt/python/cp313-cp313/bin/python 2025-09-07T06:13:42.2033103Z  ;; 2025-09-07T06:13:42.2033329Z 3.13t) 2025-09-07T06:13:42.2033695Z  PYTHON_EXECUTABLE=/opt/python/cp313-cp313t/bin/python 2025-09-07T06:13:42.2034106Z  ;; 2025-09-07T06:13:42.2034351Z 3.14) 2025-09-07T06:13:42.2034684Z  PYTHON_EXECUTABLE=/opt/python/cp314-cp314/bin/python 2025-09-07T06:13:42.2035110Z  ;; 2025-09-07T06:13:42.2035335Z 3.14t) 2025-09-07T06:13:42.2035686Z  PYTHON_EXECUTABLE=/opt/python/cp314-cp314t/bin/python 2025-09-07T06:13:42.2036105Z  ;; 2025-09-07T06:13:42.2036323Z *) 2025-09-07T06:13:42.2036627Z  echo "Unsupported python version ${PY_VERS}" 2025-09-07T06:13:42.2036999Z  exit 1 2025-09-07T06:13:42.2037245Z  ;; 2025-09-07T06:13:42.2037463Z esac 2025-09-07T06:13:42.2037698Z  2025-09-07T06:13:42.2038085Z docker exec -t "${container_name}" "${PYTHON_EXECUTABLE}" -mpip install \ 2025-09-07T06:13:42.2038636Z  --pre torch torchvision torchaudio \ 2025-09-07T06:13:42.2039208Z  --index-url "https://download.pytorch.org/whl/nightly/${BUILD_DEVICE}" 2025-09-07T06:13:42.2039706Z  2025-09-07T06:13:42.2040087Z # I wonder if there is a command to both download and install the wheels 2025-09-07T06:13:42.2040559Z # in one go 2025-09-07T06:13:42.2040999Z docker exec -t "${container_name}" "${PYTHON_EXECUTABLE}" -mpip download \ 2025-09-07T06:13:42.2041534Z  --pre torch torchvision torchaudio \ 2025-09-07T06:13:42.2042084Z  --index-url "https://download.pytorch.org/whl/nightly/${BUILD_DEVICE}" 2025-09-07T06:13:42.2042589Z  2025-09-07T06:13:42.2042820Z # Save this for later 2025-09-07T06:13:42.2043270Z echo "PYTHON_EXECUTABLE=${PYTHON_EXECUTABLE}" >> "$GITHUB_ENV" 2025-09-07T06:13:42.2043833Z echo "container_name=${container_name}" >> "$GITHUB_ENV" 2025-09-07T06:13:42.2051077Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:13:42.2051467Z env: 2025-09-07T06:13:42.2051829Z PY_VERS: 3.12 2025-09-07T06:13:42.2052151Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:13:42.2052566Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:13:42.2053170Z BUILD_DEVICE: cu128 2025-09-07T06:13:42.2053453Z ##[endgroup] 2025-09-07T06:13:42.2080885Z + mkdir -p /home/ec2-user/actions-runner/_work/_temp/artifacts/ 2025-09-07T06:13:42.2098441Z ++ docker run --tty --detach -e PLATFORM -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/pytorch -v /home/ec2-user/actions-runner/_work/_temp/artifacts:/artifacts -w /artifacts/ pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:13:42.2257968Z Unable to find image 'pytorch/manylinux2_28-builder:cuda12.8' locally 2025-09-07T06:13:42.3778805Z cuda12.8: Pulling from pytorch/manylinux2_28-builder 2025-09-07T06:13:42.3785171Z 401a23685cb3: Pulling fs layer 2025-09-07T06:13:42.3785864Z b05aefa4545a: Pulling fs layer 2025-09-07T06:13:42.3786258Z bc06c8fe1979: Pulling fs layer 2025-09-07T06:13:42.3786678Z 4d299f26f1e5: Pulling fs layer 2025-09-07T06:13:42.3787060Z 17389542b128: Pulling fs layer 2025-09-07T06:13:42.3787368Z 7214d0a5f579: Pulling fs layer 2025-09-07T06:13:42.3787662Z c19c6935f91a: Pulling fs layer 2025-09-07T06:13:42.3787978Z 53d7e1600e77: Pulling fs layer 2025-09-07T06:13:42.3788276Z b20393cddb26: Pulling fs layer 2025-09-07T06:13:42.3788578Z 4d299f26f1e5: Waiting 2025-09-07T06:13:42.3788829Z 17389542b128: Waiting 2025-09-07T06:13:42.3789105Z 64f2d9b981e0: Pulling fs layer 2025-09-07T06:13:42.3789441Z 7214d0a5f579: Waiting 2025-09-07T06:13:42.3789705Z 2185f86da86b: Pulling fs layer 2025-09-07T06:13:42.3800550Z 0df452dfdd6a: Pulling fs layer 2025-09-07T06:13:42.3801244Z c19c6935f91a: Waiting 2025-09-07T06:13:42.3801530Z b20393cddb26: Waiting 2025-09-07T06:13:42.3801802Z 7825122a7628: Pulling fs layer 2025-09-07T06:13:42.3802108Z 53d7e1600e77: Waiting 2025-09-07T06:13:42.3802360Z 64f2d9b981e0: Waiting 2025-09-07T06:13:42.3802627Z 2185f86da86b: Waiting 2025-09-07T06:13:42.3802880Z 0df452dfdd6a: Waiting 2025-09-07T06:13:42.3803161Z 67434c063477: Pulling fs layer 2025-09-07T06:13:42.3803475Z 0f4d2119bf89: Pulling fs layer 2025-09-07T06:13:42.3803784Z c21c0813dcd7: Pulling fs layer 2025-09-07T06:13:42.3804234Z 8495f8559677: Pulling fs layer 2025-09-07T06:13:42.3804527Z 90d71e6c40aa: Pulling fs layer 2025-09-07T06:13:42.3804836Z 07985ff4222d: Pulling fs layer 2025-09-07T06:13:42.3805124Z 5942d85afc57: Pulling fs layer 2025-09-07T06:13:42.3805431Z e4dffa436623: Pulling fs layer 2025-09-07T06:13:42.3805714Z 07985ff4222d: Waiting 2025-09-07T06:13:42.3805965Z 67434c063477: Waiting 2025-09-07T06:13:42.3806209Z 90d71e6c40aa: Waiting 2025-09-07T06:13:42.3806465Z 0f4d2119bf89: Waiting 2025-09-07T06:13:42.3806711Z 8495f8559677: Waiting 2025-09-07T06:13:42.3806961Z c21c0813dcd7: Waiting 2025-09-07T06:13:42.3807226Z f9f693df7fa4: Pulling fs layer 2025-09-07T06:13:42.3807520Z 5942d85afc57: Waiting 2025-09-07T06:13:42.3807781Z e4dffa436623: Waiting 2025-09-07T06:13:42.3808040Z aa298462eb75: Pulling fs layer 2025-09-07T06:13:42.3808356Z 4f4fb700ef54: Pulling fs layer 2025-09-07T06:13:42.3808642Z f9f693df7fa4: Waiting 2025-09-07T06:13:42.3808902Z aa298462eb75: Waiting 2025-09-07T06:13:42.3809161Z 7715b01d2ee7: Pulling fs layer 2025-09-07T06:13:42.3809456Z 4f4fb700ef54: Waiting 2025-09-07T06:13:42.3809714Z bc3f61192a8d: Pulling fs layer 2025-09-07T06:13:42.3810026Z 6af4ba5fb255: Pulling fs layer 2025-09-07T06:13:42.3810304Z 7715b01d2ee7: Waiting 2025-09-07T06:13:42.3810563Z bc3f61192a8d: Waiting 2025-09-07T06:13:42.3810821Z 6af4ba5fb255: Waiting 2025-09-07T06:13:42.9544802Z bc06c8fe1979: Verifying Checksum 2025-09-07T06:13:42.9545332Z bc06c8fe1979: Download complete 2025-09-07T06:13:43.2043819Z 401a23685cb3: Verifying Checksum 2025-09-07T06:13:43.2044398Z 401a23685cb3: Download complete 2025-09-07T06:13:43.2499279Z 17389542b128: Download complete 2025-09-07T06:13:43.3421554Z b05aefa4545a: Verifying Checksum 2025-09-07T06:13:43.3421999Z b05aefa4545a: Download complete 2025-09-07T06:13:43.3959728Z c19c6935f91a: Download complete 2025-09-07T06:13:43.4396769Z 53d7e1600e77: Download complete 2025-09-07T06:13:43.4549411Z 7214d0a5f579: Download complete 2025-09-07T06:13:43.4942944Z 64f2d9b981e0: Verifying Checksum 2025-09-07T06:13:43.4943348Z 64f2d9b981e0: Download complete 2025-09-07T06:13:45.1485124Z 401a23685cb3: Pull complete 2025-09-07T06:13:45.2473249Z 4d299f26f1e5: Verifying Checksum 2025-09-07T06:13:45.2473643Z 4d299f26f1e5: Download complete 2025-09-07T06:13:45.3285003Z 0df452dfdd6a: Verifying Checksum 2025-09-07T06:13:45.3285426Z 0df452dfdd6a: Download complete 2025-09-07T06:13:45.3919797Z 7825122a7628: Download complete 2025-09-07T06:13:45.4231588Z 67434c063477: Verifying Checksum 2025-09-07T06:13:45.4232021Z 67434c063477: Download complete 2025-09-07T06:13:45.4955758Z 0f4d2119bf89: Download complete 2025-09-07T06:13:45.5439539Z c21c0813dcd7: Download complete 2025-09-07T06:13:45.6422905Z 8495f8559677: Verifying Checksum 2025-09-07T06:13:45.6423324Z 8495f8559677: Download complete 2025-09-07T06:13:45.6866742Z b20393cddb26: Verifying Checksum 2025-09-07T06:13:45.6867193Z b20393cddb26: Download complete 2025-09-07T06:13:45.7026454Z 90d71e6c40aa: Verifying Checksum 2025-09-07T06:13:45.7026881Z 90d71e6c40aa: Download complete 2025-09-07T06:13:45.7297590Z 07985ff4222d: Download complete 2025-09-07T06:13:45.7462827Z 5942d85afc57: Download complete 2025-09-07T06:13:45.7984358Z b05aefa4545a: Pull complete 2025-09-07T06:13:46.0854441Z f9f693df7fa4: Verifying Checksum 2025-09-07T06:13:46.0854894Z f9f693df7fa4: Download complete 2025-09-07T06:13:46.1204013Z 2185f86da86b: Verifying Checksum 2025-09-07T06:13:46.1204443Z 2185f86da86b: Download complete 2025-09-07T06:13:46.1551730Z 4f4fb700ef54: Download complete 2025-09-07T06:13:46.6091216Z aa298462eb75: Verifying Checksum 2025-09-07T06:13:46.9009574Z bc06c8fe1979: Pull complete 2025-09-07T06:13:48.6631091Z bc3f61192a8d: Verifying Checksum 2025-09-07T06:13:48.6631504Z bc3f61192a8d: Download complete 2025-09-07T06:13:48.6890030Z 6af4ba5fb255: Verifying Checksum 2025-09-07T06:13:48.6890470Z 6af4ba5fb255: Download complete 2025-09-07T06:13:50.3486069Z e4dffa436623: Verifying Checksum 2025-09-07T06:13:50.3486525Z e4dffa436623: Download complete 2025-09-07T06:13:51.0868261Z 4d299f26f1e5: Pull complete 2025-09-07T06:13:51.1109132Z 17389542b128: Pull complete 2025-09-07T06:13:51.2726866Z 7214d0a5f579: Pull complete 2025-09-07T06:13:51.2967203Z c19c6935f91a: Pull complete 2025-09-07T06:13:51.3183673Z 53d7e1600e77: Pull complete 2025-09-07T06:13:58.9469211Z b20393cddb26: Pull complete 2025-09-07T06:13:58.9705033Z 64f2d9b981e0: Pull complete 2025-09-07T06:14:02.5748306Z 2185f86da86b: Pull complete 2025-09-07T06:14:02.6027737Z 0df452dfdd6a: Pull complete 2025-09-07T06:14:02.6271963Z 7825122a7628: Pull complete 2025-09-07T06:14:02.6487808Z 67434c063477: Pull complete 2025-09-07T06:14:02.6730563Z 0f4d2119bf89: Pull complete 2025-09-07T06:14:02.6959320Z c21c0813dcd7: Pull complete 2025-09-07T06:14:02.7549465Z 8495f8559677: Pull complete 2025-09-07T06:14:02.7775878Z 90d71e6c40aa: Pull complete 2025-09-07T06:14:02.8009684Z 07985ff4222d: Pull complete 2025-09-07T06:14:02.8231383Z 5942d85afc57: Pull complete 2025-09-07T06:14:13.1572003Z e4dffa436623: Pull complete 2025-09-07T06:14:14.0718714Z f9f693df7fa4: Pull complete 2025-09-07T06:14:14.9648562Z aa298462eb75: Pull complete 2025-09-07T06:14:14.9890785Z 4f4fb700ef54: Pull complete 2025-09-07T06:15:06.1454996Z 7715b01d2ee7: Verifying Checksum 2025-09-07T06:15:06.2039715Z 7715b01d2ee7: Download complete 2025-09-07T06:16:18.1686824Z 7715b01d2ee7: Pull complete 2025-09-07T06:16:20.2452930Z bc3f61192a8d: Pull complete 2025-09-07T06:16:20.6591689Z 6af4ba5fb255: Pull complete 2025-09-07T06:16:20.8817131Z Digest: sha256:4d39d04594f7fb158015aedb72ea01cc710b592793e002a13682d23e2d50ce6d 2025-09-07T06:16:20.9931363Z Status: Downloaded newer image for pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:17:28.8003142Z + container_name=fb05ea66628c1bb4d7cfe73a50663699628d01e7033a02d54776a0b921a636be 2025-09-07T06:17:28.8005145Z + case $PY_VERS in 2025-09-07T06:17:28.8007901Z + PYTHON_EXECUTABLE=/opt/python/cp312-cp312/bin/python 2025-09-07T06:17:28.8009429Z + docker exec -t fb05ea66628c1bb4d7cfe73a50663699628d01e7033a02d54776a0b921a636be /opt/python/cp312-cp312/bin/python -mpip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128 2025-09-07T06:17:29.3078043Z Looking in indexes: https://download.pytorch.org/whl/nightly/cu128 2025-09-07T06:17:29.4423062Z Collecting torch 2025-09-07T06:17:29.4469341Z Downloading https://download.pytorch.org/whl/nightly/cu128/torch-2.9.0.dev20250906%2Bcu128-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (30 kB) 2025-09-07T06:17:29.5794481Z Collecting torchvision 2025-09-07T06:17:29.5843001Z Downloading https://download.pytorch.org/whl/nightly/cu128/torchvision-0.24.0.dev20250906%2Bcu128-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (5.9 kB) 2025-09-07T06:17:29.6924336Z Collecting torchaudio 2025-09-07T06:17:29.6965932Z Downloading https://download.pytorch.org/whl/nightly/cu128/torchaudio-2.8.0.dev20250906%2Bcu128-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (6.9 kB) 2025-09-07T06:17:29.7312967Z Collecting filelock (from torch) 2025-09-07T06:17:29.7348938Z Downloading https://download.pytorch.org/whl/nightly/filelock-3.19.1-py3-none-any.whl.metadata (2.1 kB) 2025-09-07T06:17:29.8105311Z Collecting typing-extensions>=4.10.0 (from torch) 2025-09-07T06:17:29.8135492Z Downloading https://download.pytorch.org/whl/nightly/typing_extensions-4.14.1-py3-none-any.whl.metadata (3.0 kB) 2025-09-07T06:17:29.8215150Z Requirement already satisfied: setuptools in /opt/python/cp312-cp312/lib/python3.12/site-packages (from torch) (80.9.0) 2025-09-07T06:17:29.8565234Z Collecting sympy>=1.13.3 (from torch) 2025-09-07T06:17:29.8615718Z Downloading https://download.pytorch.org/whl/nightly/sympy-1.14.0-py3-none-any.whl.metadata (12 kB) 2025-09-07T06:17:29.8984952Z Collecting networkx>=2.5.1 (from torch) 2025-09-07T06:17:29.9018957Z Downloading https://download.pytorch.org/whl/nightly/networkx-3.5-py3-none-any.whl.metadata (6.3 kB) 2025-09-07T06:17:29.9554103Z Collecting jinja2 (from torch) 2025-09-07T06:17:29.9591177Z Downloading https://download.pytorch.org/whl/nightly/jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB) 2025-09-07T06:17:29.9877660Z Collecting fsspec>=0.8.5 (from torch) 2025-09-07T06:17:29.9911124Z Downloading https://download.pytorch.org/whl/nightly/fsspec-2025.7.0-py3-none-any.whl.metadata (12 kB) 2025-09-07T06:17:30.0568986Z Collecting nvidia-cuda-nvrtc-cu12==12.8.93 (from torch) 2025-09-07T06:17:30.0621365Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:17:30.0966509Z Collecting nvidia-cuda-runtime-cu12==12.8.90 (from torch) 2025-09-07T06:17:30.1013238Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:17:30.1377825Z Collecting nvidia-cuda-cupti-cu12==12.8.90 (from torch) 2025-09-07T06:17:30.1418112Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:17:30.1749000Z Collecting nvidia-cudnn-cu12==9.10.2.21 (from torch) 2025-09-07T06:17:30.1790614Z Downloading https://download.pytorch.org/whl/nightly/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:17:30.2090044Z Collecting nvidia-cublas-cu12==12.8.4.1 (from torch) 2025-09-07T06:17:30.2136164Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:17:30.2501274Z Collecting nvidia-cufft-cu12==11.3.3.83 (from torch) 2025-09-07T06:17:30.2749161Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:17:30.3091033Z Collecting nvidia-curand-cu12==10.3.9.90 (from torch) 2025-09-07T06:17:30.3336749Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:17:30.3654281Z Collecting nvidia-cusolver-cu12==11.7.3.90 (from torch) 2025-09-07T06:17:30.3708792Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:17:30.4194222Z Collecting nvidia-cusparse-cu12==12.5.8.93 (from torch) 2025-09-07T06:17:30.4393877Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:17:30.4697591Z Collecting nvidia-cusparselt-cu12==0.7.1 (from torch) 2025-09-07T06:17:30.4858360Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl.metadata (7.0 kB) 2025-09-07T06:17:30.5216234Z Collecting nvidia-nccl-cu12==2.27.5 (from torch) 2025-09-07T06:17:30.5280770Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB) 2025-09-07T06:17:30.5610136Z Collecting nvidia-nvshmem-cu12==3.3.20 (from torch) 2025-09-07T06:17:30.5656548Z Downloading https://download.pytorch.org/whl/nightly/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.1 kB) 2025-09-07T06:17:30.5978812Z Collecting nvidia-nvtx-cu12==12.8.90 (from torch) 2025-09-07T06:17:30.6039224Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:17:30.6359762Z Collecting nvidia-nvjitlink-cu12==12.8.93 (from torch) 2025-09-07T06:17:30.6496067Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:17:30.6814575Z Collecting nvidia-cufile-cu12==1.13.1.3 (from torch) 2025-09-07T06:17:30.7299500Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:17:30.7877329Z Collecting pytorch-triton==3.4.0+gitf7888497 (from torch) 2025-09-07T06:17:30.7908513Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.4.0%2Bgitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:17:30.8820171Z Collecting numpy (from torchvision) 2025-09-07T06:17:30.8872403Z Downloading https://download.pytorch.org/whl/nightly/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (62 kB) 2025-09-07T06:17:30.9535723Z Collecting pillow!=8.3.*,>=5.3.0 (from torchvision) 2025-09-07T06:17:30.9580700Z Downloading https://download.pytorch.org/whl/nightly/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (9.0 kB) 2025-09-07T06:17:31.0068207Z Collecting mpmath<1.4,>=1.1.0 (from sympy>=1.13.3->torch) 2025-09-07T06:17:31.0117029Z Downloading https://download.pytorch.org/whl/nightly/mpmath-1.3.0-py3-none-any.whl (536 kB) 2025-09-07T06:17:31.0247957Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/536.2 kB ? eta -:--:-- 2025-09-07T06:17:31.0248907Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 31.7 MB/s 0:00:00 2025-09-07T06:17:31.0694463Z [?25hCollecting MarkupSafe>=2.0 (from jinja2->torch) 2025-09-07T06:17:31.0731094Z Downloading https://download.pytorch.org/whl/nightly/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.0 kB) 2025-09-07T06:17:31.2251465Z Downloading https://download.pytorch.org/whl/nightly/cu128/torch-2.9.0.dev20250906%2Bcu128-cp312-cp312-manylinux_2_28_x86_64.whl (900.5 MB) 2025-09-07T06:17:31.4288147Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/900.5 MB ? eta -:--:-- 2025-09-07T06:17:31.6303445Z  ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.5/900.5 MB 102.3 MB/s eta 0:00:09 2025-09-07T06:17:31.8320223Z  ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 27.3/900.5 MB 74.8 MB/s eta 0:00:12 2025-09-07T06:17:32.0335356Z  ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 41.7/900.5 MB 69.0 MB/s eta 0:00:13 2025-09-07T06:17:32.2355638Z  ━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.9/900.5 MB 70.8 MB/s eta 0:00:12 2025-09-07T06:17:32.4376465Z  ━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.9/900.5 MB 62.3 MB/s eta 0:00:14 2025-09-07T06:17:32.6396308Z  ━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.8/900.5 MB 60.1 MB/s eta 0:00:14 2025-09-07T06:17:32.8414456Z  ━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 75.2/900.5 MB 56.4 MB/s eta 0:00:15 2025-09-07T06:17:33.0436029Z  ━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85.2/900.5 MB 52.7 MB/s eta 0:00:16 2025-09-07T06:17:33.2450526Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.4/900.5 MB 57.6 MB/s eta 0:00:14 2025-09-07T06:17:33.4468312Z  ━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 111.9/900.5 MB 55.4 MB/s eta 0:00:15 2025-09-07T06:17:33.6489604Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.6/900.5 MB 56.8 MB/s eta 0:00:14 2025-09-07T06:17:33.8510999Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.0/900.5 MB 57.9 MB/s eta 0:00:14 2025-09-07T06:17:34.0526627Z  ━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 150.7/900.5 MB 58.1 MB/s eta 0:00:13 2025-09-07T06:17:34.2542187Z  ━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 167.5/900.5 MB 59.3 MB/s eta 0:00:13 2025-09-07T06:17:34.4560241Z  ━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 175.9/900.5 MB 59.1 MB/s eta 0:00:13 2025-09-07T06:17:34.6575437Z  ━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 191.1/900.5 MB 59.3 MB/s eta 0:00:12 2025-09-07T06:17:34.8594422Z  ━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 201.1/900.5 MB 59.5 MB/s eta 0:00:12 2025-09-07T06:17:35.0611488Z  ━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.5/900.5 MB 58.7 MB/s eta 0:00:12 2025-09-07T06:17:35.2621597Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 218.1/900.5 MB 56.9 MB/s eta 0:00:12 2025-09-07T06:17:35.4644809Z  ━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 232.5/900.5 MB 57.6 MB/s eta 0:00:12 2025-09-07T06:17:35.6656848Z  ━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 234.6/900.5 MB 55.7 MB/s eta 0:00:12 2025-09-07T06:17:35.8681091Z  ━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 243.0/900.5 MB 55.5 MB/s eta 0:00:12 2025-09-07T06:17:36.0695238Z  ━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 258.5/900.5 MB 56.1 MB/s eta 0:00:12 2025-09-07T06:17:36.2715172Z  ━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 262.7/900.5 MB 54.1 MB/s eta 0:00:12 2025-09-07T06:17:36.4731530Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━ 274.5/900.5 MB 53.5 MB/s eta 0:00:12 2025-09-07T06:17:36.6749646Z  ━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━ 286.0/900.5 MB 52.9 MB/s eta 0:00:12 2025-09-07T06:17:36.8766333Z  ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 301.7/900.5 MB 54.2 MB/s eta 0:00:12 2025-09-07T06:17:37.0786975Z  ━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 310.1/900.5 MB 53.8 MB/s eta 0:00:11 2025-09-07T06:17:37.2806525Z  ━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━ 318.5/900.5 MB 52.1 MB/s eta 0:00:12 2025-09-07T06:17:37.4825684Z  ━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━ 325.3/900.5 MB 52.3 MB/s eta 0:00:12 2025-09-07T06:17:37.6846484Z  ━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━ 335.3/900.5 MB 52.9 MB/s eta 0:00:11 2025-09-07T06:17:37.8866826Z  ━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━ 343.7/900.5 MB 53.6 MB/s eta 0:00:11 2025-09-07T06:17:38.0882478Z  ━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━ 354.4/900.5 MB 53.1 MB/s eta 0:00:11 2025-09-07T06:17:38.2896572Z  ━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━ 367.3/900.5 MB 52.8 MB/s eta 0:00:11 2025-09-07T06:17:38.4917015Z  ━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━ 377.2/900.5 MB 52.7 MB/s eta 0:00:10 2025-09-07T06:17:38.6935315Z  ━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━ 385.6/900.5 MB 52.7 MB/s eta 0:00:10 2025-09-07T06:17:38.8952678Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 394.0/900.5 MB 50.9 MB/s eta 0:00:10 2025-09-07T06:17:39.0968150Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 402.4/900.5 MB 51.0 MB/s eta 0:00:10 2025-09-07T06:17:39.2987427Z  ━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━ 419.2/900.5 MB 50.8 MB/s eta 0:00:10 2025-09-07T06:17:39.4998799Z  ━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━ 427.6/900.5 MB 50.2 MB/s eta 0:00:10 2025-09-07T06:17:39.7023187Z  ━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━ 438.3/900.5 MB 50.4 MB/s eta 0:00:10 2025-09-07T06:17:39.9044862Z  ━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━ 444.3/900.5 MB 48.9 MB/s eta 0:00:10 2025-09-07T06:17:40.1058831Z  ━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━ 459.5/900.5 MB 49.6 MB/s eta 0:00:09 2025-09-07T06:17:40.3078351Z  ━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━ 476.3/900.5 MB 50.7 MB/s eta 0:00:09 2025-09-07T06:17:40.5096197Z  ━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━ 486.3/900.5 MB 50.5 MB/s eta 0:00:09 2025-09-07T06:17:40.7117377Z  ━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━ 494.7/900.5 MB 50.3 MB/s eta 0:00:09 2025-09-07T06:17:40.9136197Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 503.1/900.5 MB 51.3 MB/s eta 0:00:08 2025-09-07T06:17:41.1153646Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 519.8/900.5 MB 51.6 MB/s eta 0:00:08 2025-09-07T06:17:41.3169215Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 528.2/900.5 MB 53.1 MB/s eta 0:00:08 2025-09-07T06:17:41.5189626Z  ━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━ 539.0/900.5 MB 53.2 MB/s eta 0:00:07 2025-09-07T06:17:41.7207317Z  ━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━ 553.4/900.5 MB 52.4 MB/s eta 0:00:07 2025-09-07T06:17:41.9218480Z  ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 570.2/900.5 MB 53.1 MB/s eta 0:00:07 2025-09-07T06:17:42.1237706Z  ━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━ 579.1/900.5 MB 53.8 MB/s eta 0:00:06 2025-09-07T06:17:42.3256344Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━ 595.3/900.5 MB 56.3 MB/s eta 0:00:06 2025-09-07T06:17:42.5274564Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━ 612.1/900.5 MB 57.4 MB/s eta 0:00:06 2025-09-07T06:17:42.7289969Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 620.5/900.5 MB 58.5 MB/s eta 0:00:05 2025-09-07T06:17:42.9301091Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━ 636.0/900.5 MB 58.1 MB/s eta 0:00:05 2025-09-07T06:17:43.1322454Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 651.4/900.5 MB 60.2 MB/s eta 0:00:05 2025-09-07T06:17:43.3335106Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━ 662.4/900.5 MB 60.8 MB/s eta 0:00:04 2025-09-07T06:17:43.5351774Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 677.6/900.5 MB 61.5 MB/s eta 0:00:04 2025-09-07T06:17:43.7369867Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 681.6/900.5 MB 60.4 MB/s eta 0:00:04 2025-09-07T06:17:43.9381960Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━ 687.6/900.5 MB 58.4 MB/s eta 0:00:04 2025-09-07T06:17:44.1399888Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━ 704.4/900.5 MB 61.3 MB/s eta 0:00:04 2025-09-07T06:17:44.3417133Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 715.9/900.5 MB 61.0 MB/s eta 0:00:04 2025-09-07T06:17:44.5435235Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━ 724.6/900.5 MB 60.3 MB/s eta 0:00:03 2025-09-07T06:17:44.7454357Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━ 737.9/900.5 MB 59.7 MB/s eta 0:00:03 2025-09-07T06:17:44.9473321Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━ 754.7/900.5 MB 60.3 MB/s eta 0:00:03 2025-09-07T06:17:45.1488985Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 769.9/900.5 MB 62.4 MB/s eta 0:00:03 2025-09-07T06:17:45.3507753Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━ 779.9/900.5 MB 61.2 MB/s eta 0:00:02 2025-09-07T06:17:45.5526036Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━ 788.3/900.5 MB 61.0 MB/s eta 0:00:02 2025-09-07T06:17:45.7537583Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━ 796.4/900.5 MB 59.5 MB/s eta 0:00:02 2025-09-07T06:17:45.9557333Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 805.0/900.5 MB 60.0 MB/s eta 0:00:02 2025-09-07T06:17:46.1575361Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━ 813.4/900.5 MB 58.9 MB/s eta 0:00:02 2025-09-07T06:17:46.3593128Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━ 830.2/900.5 MB 58.7 MB/s eta 0:00:02 2025-09-07T06:17:46.5609756Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━ 842.8/900.5 MB 59.1 MB/s eta 0:00:01 2025-09-07T06:17:46.7620625Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━ 847.0/900.5 MB 57.7 MB/s eta 0:00:01 2025-09-07T06:17:46.9645024Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━ 863.8/900.5 MB 57.2 MB/s eta 0:00:01 2025-09-07T06:17:47.1656976Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━ 870.6/900.5 MB 57.0 MB/s eta 0:00:01 2025-09-07T06:17:47.3674750Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺ 879.0/900.5 MB 55.6 MB/s eta 0:00:01 2025-09-07T06:17:47.5696432Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺ 882.9/900.5 MB 54.4 MB/s eta 0:00:01 2025-09-07T06:17:47.7715406Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 897.3/900.5 MB 54.4 MB/s eta 0:00:01 2025-09-07T06:17:47.9731951Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:48.1752355Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:48.3766830Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:48.5781953Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:48.7795995Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:48.9812047Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:49.1827847Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:49.3843302Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:49.5855576Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:49.7871334Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:49.9885928Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:50.1904580Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:50.3924945Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:50.5936656Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:50.7956535Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:50.9973802Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:51.1992749Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:51.4015563Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:51.6035134Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:51.8054235Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:52.0068581Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:52.2083890Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:52.4105986Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:52.6125025Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:52.8138309Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:53.0158094Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:53.2174872Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:53.4195747Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:53.6216288Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:53.8235570Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:54.0251998Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:54.2262870Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:54.4284773Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:54.6296273Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:54.8318195Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:55.0335829Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:55.2356354Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:55.4373521Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:55.6389305Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:55.7871706Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 900.5/900.5 MB 53.9 MB/s eta 0:00:01 2025-09-07T06:17:55.7872581Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 900.5/900.5 MB 20.1 MB/s 0:00:24 2025-09-07T06:17:55.7917763Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl (594.3 MB) 2025-09-07T06:17:55.9946037Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/594.3 MB ? eta -:--:-- 2025-09-07T06:17:56.1966733Z  ━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.7/594.3 MB 448.4 MB/s eta 0:00:02 2025-09-07T06:17:56.3981418Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 187.4/594.3 MB 467.1 MB/s eta 0:00:01 2025-09-07T06:17:56.5999650Z  ━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━ 296.0/594.3 MB 489.7 MB/s eta 0:00:01 2025-09-07T06:17:56.8017497Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 404.2/594.3 MB 536.4 MB/s eta 0:00:01 2025-09-07T06:17:57.0038146Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━ 512.8/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:17:57.2055482Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:17:57.4079828Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:17:57.6095957Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:17:57.8114112Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:17:58.0129776Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:17:58.2148970Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:17:58.4167837Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:17:58.6185923Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:17:58.8206960Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:17:59.0222606Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:17:59.2248963Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:17:59.4260570Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:17:59.6280183Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:17:59.8295837Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:18:00.0317112Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:18:00.2334978Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:18:00.4354974Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:18:00.6371852Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:18:00.8388246Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:18:01.0401815Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:18:01.2427134Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:18:01.4448017Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:18:01.6468021Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:18:01.8485419Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:18:02.0017277Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 594.3/594.3 MB 538.2 MB/s eta 0:00:01 2025-09-07T06:18:02.0019149Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 594.3/594.3 MB 47.3 MB/s 0:00:06 2025-09-07T06:18:02.0064392Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (10.2 MB) 2025-09-07T06:18:02.0519459Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/10.2 MB ? eta -:--:-- 2025-09-07T06:18:02.0520271Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.2/10.2 MB 238.3 MB/s 0:00:00 2025-09-07T06:18:02.0559259Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (88.0 MB) 2025-09-07T06:18:02.2585164Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/88.0 MB ? eta -:--:-- 2025-09-07T06:18:02.4606328Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 87.8/88.0 MB 510.7 MB/s eta 0:00:01 2025-09-07T06:18:02.6627168Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 87.8/88.0 MB 510.7 MB/s eta 0:00:01 2025-09-07T06:18:02.6995011Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 87.8/88.0 MB 510.7 MB/s eta 0:00:01 2025-09-07T06:18:02.6995904Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.0/88.0 MB 137.1 MB/s 0:00:00 2025-09-07T06:18:02.7786998Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (954 kB) 2025-09-07T06:18:02.7915868Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/954.8 kB ? eta -:--:-- 2025-09-07T06:18:02.7916696Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 954.8/954.8 kB 84.3 MB/s 0:00:00 2025-09-07T06:18:02.7961647Z [?25hDownloading https://download.pytorch.org/whl/nightly/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl (706.8 MB) 2025-09-07T06:18:02.9994089Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/706.8 MB ? eta -:--:-- 2025-09-07T06:18:03.2012016Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 105.9/706.8 MB 529.9 MB/s eta 0:00:02 2025-09-07T06:18:03.4026955Z  ━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━ 216.3/706.8 MB 538.7 MB/s eta 0:00:01 2025-09-07T06:18:03.6045830Z  ━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━ 327.2/706.8 MB 549.0 MB/s eta 0:00:01 2025-09-07T06:18:03.8060016Z  ━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 437.5/706.8 MB 548.8 MB/s eta 0:00:01 2025-09-07T06:18:04.0080318Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━ 548.1/706.8 MB 549.0 MB/s eta 0:00:01 2025-09-07T06:18:04.2096657Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━ 659.3/706.8 MB 549.2 MB/s eta 0:00:01 2025-09-07T06:18:04.4116228Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:04.6132686Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:04.8149792Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:05.0169180Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:05.2188920Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:05.4212747Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:05.6228049Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:05.8248493Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:06.0267306Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:06.2287697Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:06.4310324Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:06.6327855Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:06.8350182Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:07.0367755Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:07.2389834Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:07.4408374Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:07.6429111Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:07.8448423Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:08.0460703Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:08.2482096Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:08.4496758Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:08.6518159Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:08.8414915Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 706.7/706.8 MB 550.4 MB/s eta 0:00:01 2025-09-07T06:18:08.8415795Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 706.8/706.8 MB 50.1 MB/s 0:00:06 2025-09-07T06:18:08.8455143Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (193.1 MB) 2025-09-07T06:18:09.0486679Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/193.1 MB ? eta -:--:-- 2025-09-07T06:18:09.2500564Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━ 102.2/193.1 MB 512.1 MB/s eta 0:00:01 2025-09-07T06:18:09.4521898Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 192.9/193.1 MB 525.7 MB/s eta 0:00:01 2025-09-07T06:18:09.6535345Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 192.9/193.1 MB 525.7 MB/s eta 0:00:01 2025-09-07T06:18:09.8557179Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 192.9/193.1 MB 525.7 MB/s eta 0:00:01 2025-09-07T06:18:10.0576354Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 192.9/193.1 MB 525.7 MB/s eta 0:00:01 2025-09-07T06:18:10.2594189Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 192.9/193.1 MB 525.7 MB/s eta 0:00:01 2025-09-07T06:18:10.2676342Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 192.9/193.1 MB 525.7 MB/s eta 0:00:01 2025-09-07T06:18:10.2677258Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 193.1/193.1 MB 136.0 MB/s 0:00:01 2025-09-07T06:18:10.2736373Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.2 MB) 2025-09-07T06:18:10.2846136Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.2 MB ? eta -:--:-- 2025-09-07T06:18:10.2847240Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 126.7 MB/s 0:00:00 2025-09-07T06:18:10.2894783Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu128/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl (63.6 MB) 2025-09-07T06:18:10.4928698Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/63.6 MB ? eta -:--:-- 2025-09-07T06:18:10.6945671Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 63.4/63.6 MB 530.0 MB/s eta 0:00:01 2025-09-07T06:18:10.7478836Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 63.4/63.6 MB 530.0 MB/s eta 0:00:01 2025-09-07T06:18:10.7479680Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.6/63.6 MB 139.2 MB/s 0:00:00 2025-09-07T06:18:10.7523662Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl (267.5 MB) 2025-09-07T06:18:10.9555850Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/267.5 MB ? eta -:--:-- 2025-09-07T06:18:11.1574001Z  ━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━ 104.9/267.5 MB 524.7 MB/s eta 0:00:01 2025-09-07T06:18:11.3592596Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━ 215.2/267.5 MB 536.2 MB/s eta 0:00:01 2025-09-07T06:18:11.5611563Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 267.4/267.5 MB 526.5 MB/s eta 0:00:01 2025-09-07T06:18:11.7627247Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 267.4/267.5 MB 526.5 MB/s eta 0:00:01 2025-09-07T06:18:11.9641649Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 267.4/267.5 MB 526.5 MB/s eta 0:00:01 2025-09-07T06:18:12.1658471Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 267.4/267.5 MB 526.5 MB/s eta 0:00:01 2025-09-07T06:18:12.3678878Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 267.4/267.5 MB 526.5 MB/s eta 0:00:01 2025-09-07T06:18:12.5696400Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 267.4/267.5 MB 526.5 MB/s eta 0:00:01 2025-09-07T06:18:12.7719303Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 267.4/267.5 MB 526.5 MB/s eta 0:00:01 2025-09-07T06:18:12.8042661Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 267.4/267.5 MB 526.5 MB/s eta 0:00:01 2025-09-07T06:18:12.8043541Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 267.5/267.5 MB 128.5 MB/s 0:00:02 2025-09-07T06:18:12.8082493Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (288.2 MB) 2025-09-07T06:18:13.0114152Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/288.2 MB ? eta -:--:-- 2025-09-07T06:18:13.2131278Z  ━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.6/288.2 MB 494.1 MB/s eta 0:00:01 2025-09-07T06:18:13.4149156Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 196.9/288.2 MB 490.8 MB/s eta 0:00:01 2025-09-07T06:18:13.6172691Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 288.1/288.2 MB 494.8 MB/s eta 0:00:01 2025-09-07T06:18:13.8190359Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 288.1/288.2 MB 494.8 MB/s eta 0:00:01 2025-09-07T06:18:14.0209087Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 288.1/288.2 MB 494.8 MB/s eta 0:00:01 2025-09-07T06:18:14.2227522Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 288.1/288.2 MB 494.8 MB/s eta 0:00:01 2025-09-07T06:18:14.4245823Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 288.1/288.2 MB 494.8 MB/s eta 0:00:01 2025-09-07T06:18:14.6257617Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 288.1/288.2 MB 494.8 MB/s eta 0:00:01 2025-09-07T06:18:14.8279615Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 288.1/288.2 MB 494.8 MB/s eta 0:00:01 2025-09-07T06:18:15.0007632Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 288.1/288.2 MB 494.8 MB/s eta 0:00:01 2025-09-07T06:18:15.0008517Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 288.2/288.2 MB 122.6 MB/s 0:00:02 2025-09-07T06:18:15.0054012Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl (287.2 MB) 2025-09-07T06:18:15.2084437Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/287.2 MB ? eta -:--:-- 2025-09-07T06:18:15.4098643Z  ━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━ 92.8/287.2 MB 465.0 MB/s eta 0:00:01 2025-09-07T06:18:15.6119712Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━ 169.9/287.2 MB 423.5 MB/s eta 0:00:01 2025-09-07T06:18:15.8135474Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━ 248.3/287.2 MB 411.6 MB/s eta 0:00:01 2025-09-07T06:18:16.0154419Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 287.0/287.2 MB 398.6 MB/s eta 0:00:01 2025-09-07T06:18:16.2170135Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 287.0/287.2 MB 398.6 MB/s eta 0:00:01 2025-09-07T06:18:16.4190699Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 287.0/287.2 MB 398.6 MB/s eta 0:00:01 2025-09-07T06:18:16.6208793Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 287.0/287.2 MB 398.6 MB/s eta 0:00:01 2025-09-07T06:18:16.8230009Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 287.0/287.2 MB 398.6 MB/s eta 0:00:01 2025-09-07T06:18:17.0248700Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 287.0/287.2 MB 398.6 MB/s eta 0:00:01 2025-09-07T06:18:17.1885885Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 287.0/287.2 MB 398.6 MB/s eta 0:00:01 2025-09-07T06:18:17.1886745Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 287.2/287.2 MB 123.2 MB/s 0:00:02 2025-09-07T06:18:17.1931517Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (322.3 MB) 2025-09-07T06:18:17.3963587Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/322.3 MB ? eta -:--:-- 2025-09-07T06:18:17.5978835Z  ━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 104.1/322.3 MB 522.0 MB/s eta 0:00:01 2025-09-07T06:18:17.7996752Z  ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━ 208.7/322.3 MB 520.1 MB/s eta 0:00:01 2025-09-07T06:18:18.0016681Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━ 306.4/322.3 MB 511.1 MB/s eta 0:00:01 2025-09-07T06:18:18.2041588Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 322.2/322.3 MB 508.5 MB/s eta 0:00:01 2025-09-07T06:18:18.4056469Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 322.2/322.3 MB 508.5 MB/s eta 0:00:01 2025-09-07T06:18:18.6078923Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 322.2/322.3 MB 508.5 MB/s eta 0:00:01 2025-09-07T06:18:18.8095643Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 322.2/322.3 MB 508.5 MB/s eta 0:00:01 2025-09-07T06:18:19.0113225Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 322.2/322.3 MB 508.5 MB/s eta 0:00:01 2025-09-07T06:18:19.2130177Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 322.2/322.3 MB 508.5 MB/s eta 0:00:01 2025-09-07T06:18:19.4149561Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 322.2/322.3 MB 508.5 MB/s eta 0:00:01 2025-09-07T06:18:19.6169186Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 322.2/322.3 MB 508.5 MB/s eta 0:00:01 2025-09-07T06:18:19.6597497Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 322.2/322.3 MB 508.5 MB/s eta 0:00:01 2025-09-07T06:18:19.6598381Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 322.3/322.3 MB 111.7 MB/s 0:00:02 2025-09-07T06:18:19.6654349Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.3 MB) 2025-09-07T06:18:19.8687628Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/39.3 MB ? eta -:--:-- 2025-09-07T06:18:19.9374501Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 39.1/39.3 MB 513.5 MB/s eta 0:00:01 2025-09-07T06:18:19.9375386Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.3/39.3 MB 145.1 MB/s 0:00:00 2025-09-07T06:18:19.9419713Z [?25hDownloading https://download.pytorch.org/whl/nightly/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (124.7 MB) 2025-09-07T06:18:20.1453695Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/124.7 MB ? eta -:--:-- 2025-09-07T06:18:20.3472550Z  ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 41.4/124.7 MB 206.9 MB/s eta 0:00:01 2025-09-07T06:18:20.5487479Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 124.5/124.7 MB 351.6 MB/s eta 0:00:01 2025-09-07T06:18:20.7508295Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 124.5/124.7 MB 351.6 MB/s eta 0:00:01 2025-09-07T06:18:20.8959002Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 124.5/124.7 MB 351.6 MB/s eta 0:00:01 2025-09-07T06:18:20.8959892Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.7/124.7 MB 130.9 MB/s 0:00:00 2025-09-07T06:18:20.9005421Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89 kB) 2025-09-07T06:18:21.0042958Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.4.0%2Bgitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (155.6 MB) 2025-09-07T06:18:21.2074581Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/155.6 MB ? eta -:--:-- 2025-09-07T06:18:21.4089623Z  ━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.2/155.6 MB 80.3 MB/s eta 0:00:02 2025-09-07T06:18:21.6099065Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.5/155.6 MB 54.9 MB/s eta 0:00:03 2025-09-07T06:18:21.8118514Z  ━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 28.0/155.6 MB 46.2 MB/s eta 0:00:03 2025-09-07T06:18:22.0136436Z  ━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 41.7/155.6 MB 53.1 MB/s eta 0:00:03 2025-09-07T06:18:22.2156392Z  ━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50.1/155.6 MB 55.3 MB/s eta 0:00:02 2025-09-07T06:18:22.4171101Z  ━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━ 65.3/155.6 MB 56.7 MB/s eta 0:00:02 2025-09-07T06:18:22.6185217Z  ━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━ 73.7/155.6 MB 53.6 MB/s eta 0:00:02 2025-09-07T06:18:22.8197040Z  ━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━ 83.6/155.6 MB 53.6 MB/s eta 0:00:02 2025-09-07T06:18:23.0216313Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━ 104.1/155.6 MB 57.3 MB/s eta 0:00:01 2025-09-07T06:18:23.2230128Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━ 115.6/155.6 MB 59.1 MB/s eta 0:00:01 2025-09-07T06:18:23.4248221Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━ 125.6/155.6 MB 56.9 MB/s eta 0:00:01 2025-09-07T06:18:23.6279802Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━ 140.8/155.6 MB 58.8 MB/s eta 0:00:01 2025-09-07T06:18:23.8294969Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━ 150.7/155.6 MB 57.9 MB/s eta 0:00:01 2025-09-07T06:18:23.9856692Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 155.5/155.6 MB 57.5 MB/s eta 0:00:01 2025-09-07T06:18:23.9857583Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.6/155.6 MB 52.2 MB/s 0:00:02 2025-09-07T06:18:23.9911953Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu128/torchvision-0.24.0.dev20250906%2Bcu128-cp312-cp312-manylinux_2_28_x86_64.whl (8.2 MB) 2025-09-07T06:18:24.0655625Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/8.2 MB ? eta -:--:-- 2025-09-07T06:18:24.0656483Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.2/8.2 MB 110.7 MB/s 0:00:00 2025-09-07T06:18:24.0702721Z [?25hDownloading https://download.pytorch.org/whl/nightly/cu128/torchaudio-2.8.0.dev20250906%2Bcu128-cp312-cp312-manylinux_2_28_x86_64.whl (2.0 MB) 2025-09-07T06:18:24.0824862Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/2.0 MB ? eta -:--:-- 2025-09-07T06:18:24.0825837Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 189.3 MB/s 0:00:00 2025-09-07T06:18:24.0860224Z [?25hDownloading https://download.pytorch.org/whl/nightly/fsspec-2025.7.0-py3-none-any.whl (199 kB) 2025-09-07T06:18:24.0935179Z Downloading https://download.pytorch.org/whl/nightly/networkx-3.5-py3-none-any.whl (2.0 MB) 2025-09-07T06:18:24.1059235Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/2.0 MB ? eta -:--:-- 2025-09-07T06:18:24.1060023Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 188.3 MB/s 0:00:00 2025-09-07T06:18:24.1102254Z [?25hDownloading https://download.pytorch.org/whl/nightly/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB) 2025-09-07T06:18:24.1419833Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/6.6 MB ? eta -:--:-- 2025-09-07T06:18:24.1420643Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 221.4 MB/s 0:00:00 2025-09-07T06:18:24.1462597Z [?25hDownloading https://download.pytorch.org/whl/nightly/sympy-1.14.0-py3-none-any.whl (6.3 MB) 2025-09-07T06:18:24.1751090Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/6.3 MB ? eta -:--:-- 2025-09-07T06:18:24.1751918Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 237.4 MB/s 0:00:00 2025-09-07T06:18:24.1787062Z [?25hDownloading https://download.pytorch.org/whl/nightly/typing_extensions-4.14.1-py3-none-any.whl (43 kB) 2025-09-07T06:18:24.1856629Z Downloading https://download.pytorch.org/whl/nightly/filelock-3.19.1-py3-none-any.whl (15 kB) 2025-09-07T06:18:24.1938994Z Downloading https://download.pytorch.org/whl/nightly/jinja2-3.1.6-py3-none-any.whl (134 kB) 2025-09-07T06:18:24.2015056Z Downloading https://download.pytorch.org/whl/nightly/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23 kB) 2025-09-07T06:18:24.2098652Z Downloading https://download.pytorch.org/whl/nightly/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.6 MB) 2025-09-07T06:18:24.3031546Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/16.6 MB ? eta -:--:-- 2025-09-07T06:18:24.3032662Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.6/16.6 MB 182.5 MB/s 0:00:00 2025-09-07T06:18:34.8918885Z [?25hInstalling collected packages: nvidia-cusparselt-cu12, mpmath, typing-extensions, sympy, pytorch-triton, pillow, nvidia-nvtx-cu12, nvidia-nvshmem-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufile-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, fsspec, filelock, nvidia-cusparse-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch, torchvision, torchaudio 2025-09-07T06:18:35.0596843Z [?25l 2025-09-07T06:18:35.2276754Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:18:35.3953881Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:18:35.5628753Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:18:35.7303168Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:18:35.8979633Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:18:36.0656199Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:18:36.2334882Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:18:36.4015608Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:18:36.5693056Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:18:36.7368958Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:18:36.9044592Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:18:37.0720450Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:18:37.2395416Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:18:37.4074107Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:18:37.5753965Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:18:37.7431911Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0/29 [nvidia-cusparselt-cu12] 2025-09-07T06:18:37.9103551Z  ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  1/29 [mpmath] 2025-09-07T06:18:38.0787365Z  ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  1/29 [mpmath] 2025-09-07T06:18:38.2465169Z  ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  1/29 [mpmath] 2025-09-07T06:18:38.4140624Z  ━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  2/29 [typing-extensions] 2025-09-07T06:18:38.5817311Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:38.7676621Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:38.9357934Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:39.1079236Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:39.2911197Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:39.4599942Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:39.6325280Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:39.8018394Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:39.9735764Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:40.1455832Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:40.3282752Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:40.4960470Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:40.6657914Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:40.8724729Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:41.0398731Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:41.2082284Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:41.3774120Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:41.5451480Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:41.7141941Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:41.8935259Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:42.0613623Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:42.3165206Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:42.4887407Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:42.6685926Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:42.8452266Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:43.0144785Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:43.1824125Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:43.3527421Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:43.5211602Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:43.6897625Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:43.8590863Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:44.0299082Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:44.2218378Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:44.4117106Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:44.5885951Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:44.7622710Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:44.9364455Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:45.1041300Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:45.2720638Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:45.4395970Z  ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/29 [sympy] 2025-09-07T06:18:45.6076392Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:18:45.7752012Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:18:45.9429293Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:18:46.1106538Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:18:46.2782998Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:18:46.4460765Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:18:46.6138186Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:18:46.7816021Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:18:46.9495361Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:18:47.1172962Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:18:47.2849139Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:18:47.4526864Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:18:47.6202603Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:18:47.7883288Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:18:47.9559698Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:18:48.1259138Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:18:48.2935584Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:18:48.4614492Z  ━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/29 [pytorch-triton] 2025-09-07T06:18:48.6313786Z  ━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  5/29 [pillow] 2025-09-07T06:18:48.7991374Z  ━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  5/29 [pillow] 2025-09-07T06:18:48.9667997Z  ━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  5/29 [pillow] 2025-09-07T06:18:49.1342199Z  ━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  6/29 [nvidia-nvtx-cu12] 2025-09-07T06:18:49.3019044Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  7/29 [nvidia-nvshmem-cu12] 2025-09-07T06:18:49.4695806Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  7/29 [nvidia-nvshmem-cu12] 2025-09-07T06:18:49.6374644Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  7/29 [nvidia-nvshmem-cu12] 2025-09-07T06:18:49.8050400Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  7/29 [nvidia-nvshmem-cu12] 2025-09-07T06:18:49.9726993Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  7/29 [nvidia-nvshmem-cu12] 2025-09-07T06:18:50.1401238Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  7/29 [nvidia-nvshmem-cu12] 2025-09-07T06:18:50.3079952Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  7/29 [nvidia-nvshmem-cu12] 2025-09-07T06:18:50.4756740Z  ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  7/29 [nvidia-nvshmem-cu12] 2025-09-07T06:18:50.6433532Z  ━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━  8/29 [nvidia-nvjitlink-cu12] 2025-09-07T06:18:50.8108992Z  ━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━  8/29 [nvidia-nvjitlink-cu12] 2025-09-07T06:18:50.9785570Z  ━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━  8/29 [nvidia-nvjitlink-cu12] 2025-09-07T06:18:51.1461488Z  ━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━  8/29 [nvidia-nvjitlink-cu12] 2025-09-07T06:18:51.3138673Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:51.4815588Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:51.6494002Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:51.8170806Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:51.9846781Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:52.1523462Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:52.3196044Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:52.4874688Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:52.6550943Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:52.8225021Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:52.9901726Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:53.1578628Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:53.3255954Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:53.4933965Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:53.6611349Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:53.8295891Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:53.9962440Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:54.1638375Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:54.3314627Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:54.4994703Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:54.6671585Z  ━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━  9/29 [nvidia-nccl-cu12] 2025-09-07T06:18:54.8347035Z  ━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 10/29 [nvidia-curand-cu12] 2025-09-07T06:18:55.0025614Z  ━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 10/29 [nvidia-curand-cu12] 2025-09-07T06:18:55.1704586Z  ━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 10/29 [nvidia-curand-cu12] 2025-09-07T06:18:55.3387853Z  ━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 10/29 [nvidia-curand-cu12] 2025-09-07T06:18:55.5061109Z  ━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━ 12/29 [nvidia-cuda-runtime-cu12] 2025-09-07T06:18:55.6738491Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 13/29 [nvidia-cuda-nvrtc-cu12] 2025-09-07T06:18:55.8416070Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 13/29 [nvidia-cuda-nvrtc-cu12] 2025-09-07T06:18:56.0095581Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 13/29 [nvidia-cuda-nvrtc-cu12] 2025-09-07T06:18:56.1771949Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 13/29 [nvidia-cuda-nvrtc-cu12] 2025-09-07T06:18:56.3449008Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 13/29 [nvidia-cuda-nvrtc-cu12] 2025-09-07T06:18:56.5124989Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 13/29 [nvidia-cuda-nvrtc-cu12] 2025-09-07T06:18:56.6798208Z  ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━ 13/29 [nvidia-cuda-nvrtc-cu12] 2025-09-07T06:18:56.8478452Z  ━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━ 14/29 [nvidia-cuda-cupti-cu12] 2025-09-07T06:18:57.0154273Z  ━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━ 14/29 [nvidia-cuda-cupti-cu12] 2025-09-07T06:18:57.1829099Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:18:57.3504012Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:18:57.5181037Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:18:57.6858550Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:18:57.8536221Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:18:58.0214734Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:18:58.1890948Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:18:58.3566475Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:18:58.5242772Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:18:58.6919048Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:18:58.8594360Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:18:59.0271857Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:18:59.1945436Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:18:59.3622046Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:18:59.5298337Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:18:59.6976020Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:18:59.8654990Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:00.0331700Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:00.2004300Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:00.3683972Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:00.5361561Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:00.7038509Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:00.8717179Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:01.0394748Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:01.2072002Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:01.3745872Z  ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 15/29 [nvidia-cublas-cu12] 2025-09-07T06:19:01.5422882Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:01.7100489Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:01.8797799Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:02.0512104Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:02.2245498Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:02.4065656Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:02.5749626Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:02.7474080Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:02.9158610Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:03.0841481Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:03.2542380Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:03.4286137Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:03.5961780Z  ━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━ 16/29 [numpy] 2025-09-07T06:19:03.7638157Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 17/29 [networkx] 2025-09-07T06:19:03.9323084Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 17/29 [networkx] 2025-09-07T06:19:04.1034364Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 17/29 [networkx] 2025-09-07T06:19:04.2751132Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 17/29 [networkx] 2025-09-07T06:19:04.4457837Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 17/29 [networkx] 2025-09-07T06:19:04.6195693Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 17/29 [networkx] 2025-09-07T06:19:04.7879811Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 17/29 [networkx] 2025-09-07T06:19:04.9561375Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 17/29 [networkx] 2025-09-07T06:19:05.1237319Z  ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 17/29 [networkx] 2025-09-07T06:19:05.2915099Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━ 19/29 [fsspec] 2025-09-07T06:19:05.4591737Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:05.6264646Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:05.7942327Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:05.9618276Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:06.1296451Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:06.2974479Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:06.4652949Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:06.6328626Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:06.8002795Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:06.9683238Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:07.1358251Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:07.3034759Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:07.4712134Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:07.6384333Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:07.8061184Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:07.9738251Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:08.1415878Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:08.3095636Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:08.4773733Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 21/29 [nvidia-cusparse-cu12] 2025-09-07T06:19:08.6450197Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:08.8127555Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:08.9801903Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:09.1482546Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:09.3157430Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:09.4834274Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:09.6511074Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:09.8184135Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:09.9861716Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:10.1538721Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:10.3216161Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 22/29 [nvidia-cufft-cu12] 2025-09-07T06:19:10.4894852Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:10.6571206Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:10.8246939Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:10.9923944Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:11.1596413Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:11.3275967Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:11.4952032Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:11.6625125Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:11.8301764Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:11.9978479Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:12.1656137Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:12.3334812Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:12.5013262Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:12.6689532Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:12.8364738Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:13.0040761Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:13.1717018Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:13.3392842Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:13.5065461Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:13.6742306Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:13.8419074Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:14.0097071Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:14.1775040Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:14.3452744Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:14.5128431Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:14.6801293Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:14.8479692Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:15.0156091Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:15.1831891Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:15.3504341Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:15.5181552Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:15.6858496Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:15.8536387Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:16.0214435Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:16.1890190Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:16.3566865Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:16.5243954Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:16.6919976Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:16.8595583Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:17.0271885Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:17.1945573Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:17.3622265Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:17.5299149Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:17.6976178Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:17.8654644Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━ 23/29 [nvidia-cudnn-cu12] 2025-09-07T06:19:18.0330305Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:18.2004135Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:18.3683727Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:18.5358637Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:18.7034651Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:18.8711375Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:19.0384477Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:19.2062933Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:19.3740415Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:19.5416174Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:19.7095808Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:19.8773510Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:20.0449096Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:20.2124954Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:20.3799278Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:20.5479205Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:20.7155267Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:20.8831723Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:21.0612129Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━ 25/29 [nvidia-cusolver-cu12] 2025-09-07T06:19:21.2336433Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:21.4015568Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:21.5693907Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:21.7370631Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:21.9046830Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:22.0722479Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:22.2395251Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:22.4073939Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:22.5748697Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:22.7422840Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:22.9099984Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:23.0776944Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:23.2455920Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:23.4134850Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:23.5813857Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:23.7491086Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:23.9168258Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:24.0845483Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:24.2521697Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:24.4195852Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:24.5876419Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:24.7553059Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:24.9227352Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:25.0904413Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:25.2582167Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:25.4259669Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:25.5937198Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:25.7615574Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:25.9294372Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:26.0971517Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:26.2647837Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:26.4323506Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:26.5996665Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:26.7676892Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:26.9353122Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:27.1024916Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:27.2701958Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:27.4378960Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:27.6056113Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:27.7735119Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:27.9413974Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:28.1090503Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:28.2766352Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:28.4444520Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:28.6121191Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:28.7795043Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:28.9474001Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:29.1147056Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:29.2822597Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:29.4499805Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:29.6178422Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:29.7856224Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:29.9535415Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:30.1214436Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:30.2891135Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:30.4569532Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:30.6245449Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:30.7920748Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:30.9595399Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:31.1273393Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:31.2947056Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:31.4622319Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:31.6299685Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:31.7976439Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:31.9656050Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:32.1334639Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:32.3011314Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:32.4688206Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:32.6369301Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:32.8045051Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:32.9722334Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:33.1395299Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:33.3075875Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:33.4752802Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:33.6431590Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:33.8104932Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:33.9827768Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:34.1515649Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:34.3232823Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:34.4946766Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:34.6675767Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:34.8361148Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:35.0376810Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:35.2087947Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:35.3880157Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:35.5563932Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:35.7245106Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:35.9258015Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:36.1129046Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:36.3112911Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:36.4858247Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:36.6540820Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:36.8255955Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:36.9941767Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:37.1619403Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:37.3335007Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:37.5014340Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:37.6767432Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:37.8451258Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:38.0129306Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:38.1811328Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:38.3491458Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:38.5170695Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:38.6851480Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:38.8527349Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:39.0210819Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:39.1891453Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:39.3566943Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:39.5289276Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:39.6980149Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:39.8704340Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:40.0586753Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:40.3614555Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:40.6931192Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:40.8694899Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:41.0372693Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:41.2055552Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:41.3736392Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:41.5414469Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:41.7093487Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:41.8770793Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:42.0784981Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:42.2462561Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 26/29 [torch] 2025-09-07T06:19:42.4139857Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━ 27/29 [torchvision] 2025-09-07T06:19:42.5823911Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━ 27/29 [torchvision] 2025-09-07T06:19:42.7500798Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━ 27/29 [torchvision] 2025-09-07T06:19:42.9186407Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━ 27/29 [torchvision] 2025-09-07T06:19:43.0872736Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━ 27/29 [torchvision] 2025-09-07T06:19:43.2549341Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━ 28/29 [torchaudio] 2025-09-07T06:19:43.3406345Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━ 28/29 [torchaudio] 2025-09-07T06:19:43.3407390Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 29/29 [torchaudio] 2025-09-07T06:19:43.3407831Z [?25h 2025-09-07T06:19:43.3447984Z Successfully installed MarkupSafe-3.0.2 filelock-3.19.1 fsspec-2025.7.0 jinja2-3.1.6 mpmath-1.3.0 networkx-3.5 numpy-2.3.2 nvidia-cublas-cu12-12.8.4.1 nvidia-cuda-cupti-cu12-12.8.90 nvidia-cuda-nvrtc-cu12-12.8.93 nvidia-cuda-runtime-cu12-12.8.90 nvidia-cudnn-cu12-9.10.2.21 nvidia-cufft-cu12-11.3.3.83 nvidia-cufile-cu12-1.13.1.3 nvidia-curand-cu12-10.3.9.90 nvidia-cusolver-cu12-11.7.3.90 nvidia-cusparse-cu12-12.5.8.93 nvidia-cusparselt-cu12-0.7.1 nvidia-nccl-cu12-2.27.5 nvidia-nvjitlink-cu12-12.8.93 nvidia-nvshmem-cu12-3.3.20 nvidia-nvtx-cu12-12.8.90 pillow-11.3.0 pytorch-triton-3.4.0+gitf7888497 sympy-1.14.0 torch-2.9.0.dev20250906+cu128 torchaudio-2.8.0.dev20250906+cu128 torchvision-0.24.0.dev20250906+cu128 typing-extensions-4.14.1 2025-09-07T06:19:43.3453572Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-09-07T06:19:43.6810289Z + docker exec -t fb05ea66628c1bb4d7cfe73a50663699628d01e7033a02d54776a0b921a636be /opt/python/cp312-cp312/bin/python -mpip download --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128 2025-09-07T06:19:44.2366678Z Looking in indexes: https://download.pytorch.org/whl/nightly/cu128 2025-09-07T06:19:44.3883785Z Collecting torch 2025-09-07T06:19:44.4159640Z Using cached https://download.pytorch.org/whl/nightly/cu128/torch-2.9.0.dev20250906%2Bcu128-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (30 kB) 2025-09-07T06:19:44.5312062Z Collecting torchvision 2025-09-07T06:19:44.5735805Z Using cached https://download.pytorch.org/whl/nightly/cu128/torchvision-0.24.0.dev20250906%2Bcu128-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (5.9 kB) 2025-09-07T06:19:44.6852356Z Collecting torchaudio 2025-09-07T06:19:44.7294990Z Using cached https://download.pytorch.org/whl/nightly/cu128/torchaudio-2.8.0.dev20250906%2Bcu128-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (6.9 kB) 2025-09-07T06:19:44.7677158Z Collecting filelock (from torch) 2025-09-07T06:19:44.8135495Z Using cached https://download.pytorch.org/whl/nightly/filelock-3.19.1-py3-none-any.whl.metadata (2.1 kB) 2025-09-07T06:19:44.8404918Z Collecting typing-extensions>=4.10.0 (from torch) 2025-09-07T06:19:44.8795696Z Using cached https://download.pytorch.org/whl/nightly/typing_extensions-4.14.1-py3-none-any.whl.metadata (3.0 kB) 2025-09-07T06:19:44.9212245Z Collecting setuptools (from torch) 2025-09-07T06:19:44.9256014Z Downloading https://download.pytorch.org/whl/nightly/setuptools-78.1.0-py3-none-any.whl.metadata (6.6 kB) 2025-09-07T06:19:45.0555702Z Collecting sympy>=1.13.3 (from torch) 2025-09-07T06:19:45.0942574Z Using cached https://download.pytorch.org/whl/nightly/sympy-1.14.0-py3-none-any.whl.metadata (12 kB) 2025-09-07T06:19:45.1210264Z Collecting networkx>=2.5.1 (from torch) 2025-09-07T06:19:45.1656256Z Using cached https://download.pytorch.org/whl/nightly/networkx-3.5-py3-none-any.whl.metadata (6.3 kB) 2025-09-07T06:19:45.2084112Z Collecting jinja2 (from torch) 2025-09-07T06:19:45.2559834Z Using cached https://download.pytorch.org/whl/nightly/jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB) 2025-09-07T06:19:45.2838981Z Collecting fsspec>=0.8.5 (from torch) 2025-09-07T06:19:45.3248267Z Using cached https://download.pytorch.org/whl/nightly/fsspec-2025.7.0-py3-none-any.whl.metadata (12 kB) 2025-09-07T06:19:45.3760958Z Collecting nvidia-cuda-nvrtc-cu12==12.8.93 (from torch) 2025-09-07T06:19:45.4155095Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:19:45.4453796Z Collecting nvidia-cuda-runtime-cu12==12.8.90 (from torch) 2025-09-07T06:19:45.4878099Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:19:45.5128990Z Collecting nvidia-cuda-cupti-cu12==12.8.90 (from torch) 2025-09-07T06:19:45.5571040Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:19:45.5854692Z Collecting nvidia-cudnn-cu12==9.10.2.21 (from torch) 2025-09-07T06:19:45.6297647Z Using cached https://download.pytorch.org/whl/nightly/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:19:45.6558698Z Collecting nvidia-cublas-cu12==12.8.4.1 (from torch) 2025-09-07T06:19:45.6994689Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:19:45.7253978Z Collecting nvidia-cufft-cu12==11.3.3.83 (from torch) 2025-09-07T06:19:45.7686106Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:19:45.7951497Z Collecting nvidia-curand-cu12==10.3.9.90 (from torch) 2025-09-07T06:19:45.8436616Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:19:45.8679659Z Collecting nvidia-cusolver-cu12==11.7.3.90 (from torch) 2025-09-07T06:19:45.9155590Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:19:45.9414788Z Collecting nvidia-cusparse-cu12==12.5.8.93 (from torch) 2025-09-07T06:19:45.9887924Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:19:46.0129793Z Collecting nvidia-cusparselt-cu12==0.7.1 (from torch) 2025-09-07T06:19:46.0575616Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl.metadata (7.0 kB) 2025-09-07T06:19:46.0915559Z Collecting nvidia-nccl-cu12==2.27.5 (from torch) 2025-09-07T06:19:46.1308767Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB) 2025-09-07T06:19:46.1677860Z Collecting nvidia-nvshmem-cu12==3.3.20 (from torch) 2025-09-07T06:19:46.2007027Z Using cached https://download.pytorch.org/whl/nightly/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.1 kB) 2025-09-07T06:19:46.2407407Z Collecting nvidia-nvtx-cu12==12.8.90 (from torch) 2025-09-07T06:19:46.2802925Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:19:46.3215195Z Collecting nvidia-nvjitlink-cu12==12.8.93 (from torch) 2025-09-07T06:19:46.3700867Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:19:46.4083327Z Collecting nvidia-cufile-cu12==1.13.1.3 (from torch) 2025-09-07T06:19:46.4575101Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-09-07T06:19:46.5175737Z Collecting pytorch-triton==3.4.0+gitf7888497 (from torch) 2025-09-07T06:19:46.5869874Z Using cached https://download.pytorch.org/whl/nightly/pytorch_triton-3.4.0%2Bgitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.8 kB) 2025-09-07T06:19:46.6527354Z Collecting numpy (from torchvision) 2025-09-07T06:19:46.6947572Z Using cached https://download.pytorch.org/whl/nightly/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (62 kB) 2025-09-07T06:19:46.7535689Z Collecting pillow!=8.3.*,>=5.3.0 (from torchvision) 2025-09-07T06:19:46.7995647Z Using cached https://download.pytorch.org/whl/nightly/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (9.0 kB) 2025-09-07T06:19:46.8472831Z Collecting mpmath<1.4,>=1.1.0 (from sympy>=1.13.3->torch) 2025-09-07T06:19:46.8895509Z Using cached https://download.pytorch.org/whl/nightly/mpmath-1.3.0-py3-none-any.whl (536 kB) 2025-09-07T06:19:46.9460536Z Collecting MarkupSafe>=2.0 (from jinja2->torch) 2025-09-07T06:19:46.9806867Z Using cached https://download.pytorch.org/whl/nightly/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.0 kB) 2025-09-07T06:19:47.0529890Z Using cached https://download.pytorch.org/whl/nightly/cu128/torch-2.9.0.dev20250906%2Bcu128-cp312-cp312-manylinux_2_28_x86_64.whl (900.5 MB) 2025-09-07T06:19:47.5925729Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl (594.3 MB) 2025-09-07T06:19:47.9522516Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (10.2 MB) 2025-09-07T06:19:48.0001503Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (88.0 MB) 2025-09-07T06:19:48.0964178Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (954 kB) 2025-09-07T06:19:48.1494880Z Using cached https://download.pytorch.org/whl/nightly/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl (706.8 MB) 2025-09-07T06:19:48.5774002Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (193.1 MB) 2025-09-07T06:19:48.7395686Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.2 MB) 2025-09-07T06:19:48.7965045Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl (63.6 MB) 2025-09-07T06:19:48.8858475Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl (267.5 MB) 2025-09-07T06:19:49.0855990Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (288.2 MB) 2025-09-07T06:19:49.3016692Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl (287.2 MB) 2025-09-07T06:19:49.4833638Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (322.3 MB) 2025-09-07T06:19:49.6771437Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.3 MB) 2025-09-07T06:19:49.7347713Z Using cached https://download.pytorch.org/whl/nightly/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (124.7 MB) 2025-09-07T06:19:49.8296117Z Using cached https://download.pytorch.org/whl/nightly/cu128/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89 kB) 2025-09-07T06:19:49.8640668Z Using cached https://download.pytorch.org/whl/nightly/pytorch_triton-3.4.0%2Bgitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (155.6 MB) 2025-09-07T06:19:49.9755303Z Using cached https://download.pytorch.org/whl/nightly/cu128/torchvision-0.24.0.dev20250906%2Bcu128-cp312-cp312-manylinux_2_28_x86_64.whl (8.2 MB) 2025-09-07T06:19:49.9987643Z Using cached https://download.pytorch.org/whl/nightly/cu128/torchaudio-2.8.0.dev20250906%2Bcu128-cp312-cp312-manylinux_2_28_x86_64.whl (2.0 MB) 2025-09-07T06:19:50.0395101Z Using cached https://download.pytorch.org/whl/nightly/fsspec-2025.7.0-py3-none-any.whl (199 kB) 2025-09-07T06:19:50.0621904Z Using cached https://download.pytorch.org/whl/nightly/networkx-3.5-py3-none-any.whl (2.0 MB) 2025-09-07T06:19:50.1025881Z Using cached https://download.pytorch.org/whl/nightly/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.6 MB) 2025-09-07T06:19:50.1108091Z Downloading https://download.pytorch.org/whl/nightly/setuptools-78.1.0-py3-none-any.whl (1.3 MB) 2025-09-07T06:19:50.1997883Z [?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.3 MB ? eta -:--:-- 2025-09-07T06:19:50.1999453Z  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 11.7 MB/s 0:00:00 2025-09-07T06:19:50.2330057Z [?25hUsing cached https://download.pytorch.org/whl/nightly/sympy-1.14.0-py3-none-any.whl (6.3 MB) 2025-09-07T06:19:50.2645099Z Using cached https://download.pytorch.org/whl/nightly/typing_extensions-4.14.1-py3-none-any.whl (43 kB) 2025-09-07T06:19:50.2951127Z Using cached https://download.pytorch.org/whl/nightly/filelock-3.19.1-py3-none-any.whl (15 kB) 2025-09-07T06:19:50.3142673Z Using cached https://download.pytorch.org/whl/nightly/jinja2-3.1.6-py3-none-any.whl (134 kB) 2025-09-07T06:19:50.3527292Z Using cached https://download.pytorch.org/whl/nightly/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23 kB) 2025-09-07T06:19:50.3869444Z Using cached https://download.pytorch.org/whl/nightly/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.6 MB) 2025-09-07T06:20:01.3328734Z Saved ./torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T06:20:01.5948607Z Saved ./nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl 2025-09-07T06:20:01.5997599Z Saved ./nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl 2025-09-07T06:20:01.6394360Z Saved ./nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl 2025-09-07T06:20:01.6399531Z Saved ./nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl 2025-09-07T06:20:01.9508177Z Saved ./nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl 2025-09-07T06:20:02.0360296Z Saved ./nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl 2025-09-07T06:20:02.0365884Z Saved ./nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl 2025-09-07T06:20:02.0654415Z Saved ./nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl 2025-09-07T06:20:02.1825458Z Saved ./nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl 2025-09-07T06:20:02.3095318Z Saved ./nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl 2025-09-07T06:20:02.4355561Z Saved ./nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl 2025-09-07T06:20:02.5775421Z Saved ./nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl 2025-09-07T06:20:02.5955634Z Saved ./nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl 2025-09-07T06:20:02.6501727Z Saved ./nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl 2025-09-07T06:20:02.6504895Z Saved ./nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl 2025-09-07T06:20:02.7196212Z Saved ./pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T06:20:02.7238927Z Saved ./torchvision-0.24.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T06:20:02.7248996Z Saved ./torchaudio-2.8.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T06:20:02.7251921Z Saved ./fsspec-2025.7.0-py3-none-any.whl 2025-09-07T06:20:02.7270367Z Saved ./networkx-3.5-py3-none-any.whl 2025-09-07T06:20:02.7299901Z Saved ./pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T06:20:02.7308444Z Saved ./setuptools-78.1.0-py3-none-any.whl 2025-09-07T06:20:02.7340449Z Saved ./sympy-1.14.0-py3-none-any.whl 2025-09-07T06:20:02.7345836Z Saved ./mpmath-1.3.0-py3-none-any.whl 2025-09-07T06:20:02.7349215Z Saved ./typing_extensions-4.14.1-py3-none-any.whl 2025-09-07T06:20:02.7352342Z Saved ./filelock-3.19.1-py3-none-any.whl 2025-09-07T06:20:02.7356009Z Saved ./jinja2-3.1.6-py3-none-any.whl 2025-09-07T06:20:02.7359498Z Saved ./MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 2025-09-07T06:20:02.7442776Z Saved ./numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T06:20:02.7445460Z Successfully downloaded torch nvidia-cublas-cu12 nvidia-cuda-cupti-cu12 nvidia-cuda-nvrtc-cu12 nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12 nvidia-cufft-cu12 nvidia-cufile-cu12 nvidia-curand-cu12 nvidia-cusolver-cu12 nvidia-cusparse-cu12 nvidia-cusparselt-cu12 nvidia-nccl-cu12 nvidia-nvjitlink-cu12 nvidia-nvshmem-cu12 nvidia-nvtx-cu12 pytorch-triton torchvision torchaudio fsspec networkx pillow setuptools sympy mpmath typing-extensions filelock jinja2 MarkupSafe numpy 2025-09-07T06:20:03.1215287Z + echo PYTHON_EXECUTABLE=/opt/python/cp312-cp312/bin/python 2025-09-07T06:20:03.1217594Z + echo container_name=fb05ea66628c1bb4d7cfe73a50663699628d01e7033a02d54776a0b921a636be 2025-09-07T06:20:03.1278867Z Prepare all required actions 2025-09-07T06:20:03.1332079Z ##[group]Run ./.github/actions/build-external-packages 2025-09-07T06:20:03.1332464Z with: 2025-09-07T06:20:03.1332809Z build-targets: vllm 2025-09-07T06:20:03.1333324Z docker-image: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:20:03.1333732Z cuda-arch-list: 8.0;8.9;9.0;10.0;12.0 2025-09-07T06:20:03.1334232Z torch-wheel-dir: /home/ec2-user/actions-runner/_work/_temp/artifacts 2025-09-07T06:20:03.1334865Z output-dir: /home/ec2-user/actions-runner/_work/_temp/artifacts/externals 2025-09-07T06:20:03.1335390Z cuda-version: 12.8.1 2025-09-07T06:20:03.1335656Z env: 2025-09-07T06:20:03.1335867Z PY_VERS: 3.12 2025-09-07T06:20:03.1336213Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:20:03.1336636Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:20:03.1336965Z BUILD_DEVICE: cu128 2025-09-07T06:20:03.1337312Z PYTHON_EXECUTABLE: /opt/python/cp312-cp312/bin/python 2025-09-07T06:20:03.1337981Z container_name: fb05ea66628c1bb4d7cfe73a50663699628d01e7033a02d54776a0b921a636be 2025-09-07T06:20:03.1338506Z ##[endgroup] 2025-09-07T06:20:03.1524073Z ##[group]Run set -euo pipefail 2025-09-07T06:20:03.1524437Z set -euo pipefail 2025-09-07T06:20:03.1524722Z python3 --version 2025-09-07T06:20:03.1524986Z docker images 2025-09-07T06:20:03.1525253Z START_TIME=$(date +%s) 2025-09-07T06:20:03.1525522Z ( 2025-09-07T06:20:03.1525751Z  cd .ci/lumen_cli 2025-09-07T06:20:03.1526035Z  python3 -m pip install -e . 2025-09-07T06:20:03.1526360Z ) 2025-09-07T06:20:03.1526769Z MAX_JOBS="$(nproc --ignore=6)" 2025-09-07T06:20:03.1527094Z export MAX_JOBS 2025-09-07T06:20:03.1527362Z  2025-09-07T06:20:03.1527668Z # Split the comma-separated list and build each target 2025-09-07T06:20:03.1528138Z IFS=',' read -ra TARGETS <<< "$BUILD_TARGETS" 2025-09-07T06:20:03.1528527Z for target in "${TARGETS[@]}"; do 2025-09-07T06:20:03.1528908Z  OUTPUT_DIR="$PARENT_OUTPUT_DIR/$target" 2025-09-07T06:20:03.1529266Z  export OUTPUT_DIR 2025-09-07T06:20:03.1529706Z  echo "Building external package: $target in directory $OUTPUT_DIR" 2025-09-07T06:20:03.1530240Z  python3 -m cli.run build external "$target" 2025-09-07T06:20:03.1530598Z done 2025-09-07T06:20:03.1530825Z  2025-09-07T06:20:03.1531042Z END_TIME=$(date +%s) 2025-09-07T06:20:03.1531320Z { 2025-09-07T06:20:03.1531751Z  echo "build_time=$((END_TIME - START_TIME))" 2025-09-07T06:20:03.1532161Z  if [ -d "$PARENT_OUTPUT_DIR" ]; then 2025-09-07T06:20:03.1532668Z  echo "output_dir=$PARENT_OUTPUT_DIR" 2025-09-07T06:20:03.1533192Z  fi 2025-09-07T06:20:03.1533455Z } >> "$GITHUB_OUTPUT" 2025-09-07T06:20:03.1543928Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T06:20:03.1544322Z env: 2025-09-07T06:20:03.1544521Z PY_VERS: 3.12 2025-09-07T06:20:03.1544831Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:20:03.1545208Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T06:20:03.1545501Z BUILD_DEVICE: cu128 2025-09-07T06:20:03.1545806Z PYTHON_EXECUTABLE: /opt/python/cp312-cp312/bin/python 2025-09-07T06:20:03.1546354Z container_name: fb05ea66628c1bb4d7cfe73a50663699628d01e7033a02d54776a0b921a636be 2025-09-07T06:20:03.1546902Z SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 2025-09-07T06:20:03.1547271Z SCCACHE_REGION: us-east-1 2025-09-07T06:20:03.1547545Z CUDA_VERSION: 12.8.1 2025-09-07T06:20:03.1547802Z TORCH_CUDA_ARCH_LIST: 8.0;8.9;9.0;10.0;12.0 2025-09-07T06:20:03.1548177Z BASE_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:20:03.1548520Z BUILD_TARGETS: vllm 2025-09-07T06:20:03.1548937Z PARENT_OUTPUT_DIR: /home/ec2-user/actions-runner/_work/_temp/artifacts/externals 2025-09-07T06:20:03.1549678Z TORCH_WHEELS_PATH: /home/ec2-user/actions-runner/_work/_temp/artifacts 2025-09-07T06:20:03.1550115Z ##[endgroup] 2025-09-07T06:20:03.1600439Z Python 3.9.23 2025-09-07T06:20:03.1725320Z REPOSITORY TAG IMAGE ID CREATED SIZE 2025-09-07T06:20:03.1725998Z pytorch/manylinux2_28-builder cuda12.8 ab9df097091a 28 hours ago 17.6GB 2025-09-07T06:20:03.5543466Z Defaulting to user installation because normal site-packages is not writeable 2025-09-07T06:20:03.5708333Z Obtaining file:///home/ec2-user/actions-runner/_work/pytorch/pytorch/.ci/lumen_cli 2025-09-07T06:20:03.6969325Z Installing build dependencies: started 2025-09-07T06:20:06.4407809Z Installing build dependencies: finished with status 'done' 2025-09-07T06:20:06.4435623Z Checking if build backend supports build_editable: started 2025-09-07T06:20:06.5744584Z Checking if build backend supports build_editable: finished with status 'done' 2025-09-07T06:20:06.5753330Z Getting requirements to build editable: started 2025-09-07T06:20:06.7514883Z Getting requirements to build editable: finished with status 'done' 2025-09-07T06:20:06.7523157Z Preparing editable metadata (pyproject.toml): started 2025-09-07T06:20:06.9296673Z Preparing editable metadata (pyproject.toml): finished with status 'done' 2025-09-07T06:20:07.0410205Z Collecting docker==7.1.0 2025-09-07T06:20:07.0589171Z Downloading docker-7.1.0-py3-none-any.whl (147 kB) 2025-09-07T06:20:07.2207171Z Collecting pyyaml==6.0.2 2025-09-07T06:20:07.2249609Z Downloading PyYAML-6.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (737 kB) 2025-09-07T06:20:08.2844624Z Collecting uv==0.8.6 2025-09-07T06:20:08.2912712Z Downloading uv-0.8.6-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.3 MB) 2025-09-07T06:20:08.6258650Z Collecting pytest==7.3.2 2025-09-07T06:20:08.6310323Z Downloading pytest-7.3.2-py3-none-any.whl (320 kB) 2025-09-07T06:20:08.8429482Z Collecting GitPython==3.1.45 2025-09-07T06:20:08.8469667Z Downloading gitpython-3.1.45-py3-none-any.whl (208 kB) 2025-09-07T06:20:09.0447651Z Collecting urllib3>=1.26.0 2025-09-07T06:20:09.0487103Z Downloading urllib3-2.5.0-py3-none-any.whl (129 kB) 2025-09-07T06:20:09.2255260Z Collecting requests>=2.26.0 2025-09-07T06:20:09.2294573Z Downloading requests-2.32.5-py3-none-any.whl (64 kB) 2025-09-07T06:20:09.3697595Z Collecting typing-extensions>=3.10.0.2 2025-09-07T06:20:09.3741543Z Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB) 2025-09-07T06:20:09.4448867Z Collecting gitdb<5,>=4.0.1 2025-09-07T06:20:09.4491347Z Downloading gitdb-4.0.12-py3-none-any.whl (62 kB) 2025-09-07T06:20:09.5567708Z Collecting packaging 2025-09-07T06:20:09.5606547Z Downloading packaging-25.0-py3-none-any.whl (66 kB) 2025-09-07T06:20:09.6284775Z Collecting pluggy<2.0,>=0.12 2025-09-07T06:20:09.6333471Z Downloading pluggy-1.6.0-py3-none-any.whl (20 kB) 2025-09-07T06:20:09.7174874Z Collecting tomli>=1.0.0 2025-09-07T06:20:09.7213303Z Downloading tomli-2.2.1-py3-none-any.whl (14 kB) 2025-09-07T06:20:09.8191037Z Collecting iniconfig 2025-09-07T06:20:09.8230957Z Downloading iniconfig-2.1.0-py3-none-any.whl (6.0 kB) 2025-09-07T06:20:09.9393806Z Collecting exceptiongroup>=1.0.0rc8 2025-09-07T06:20:09.9434473Z Downloading exceptiongroup-1.3.0-py3-none-any.whl (16 kB) 2025-09-07T06:20:10.0594623Z Collecting smmap<6,>=3.0.1 2025-09-07T06:20:10.0650165Z Downloading smmap-5.0.2-py3-none-any.whl (24 kB) 2025-09-07T06:20:10.1261480Z Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3.9/site-packages (from requests>=2.26.0->docker==7.1.0->lumen-ci==0.1.0) (2.10) 2025-09-07T06:20:10.2204345Z Collecting certifi>=2017.4.17 2025-09-07T06:20:10.2244447Z Downloading certifi-2025.8.3-py3-none-any.whl (161 kB) 2025-09-07T06:20:10.7253676Z Collecting charset_normalizer<4,>=2 2025-09-07T06:20:10.7302878Z Downloading charset_normalizer-3.4.3-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (152 kB) 2025-09-07T06:20:10.8019901Z Building wheels for collected packages: lumen-ci 2025-09-07T06:20:10.8025079Z Building editable for lumen-ci (pyproject.toml): started 2025-09-07T06:20:11.0049018Z Building editable for lumen-ci (pyproject.toml): finished with status 'done' 2025-09-07T06:20:11.0054627Z Created wheel for lumen-ci: filename=lumen_ci-0.1.0-0.editable-py3-none-any.whl size=2721 sha256=f629b3e61e0315c82b77a5725b43e381c6485d402a78a652675b811fed57e7e1 2025-09-07T06:20:11.0055993Z Stored in directory: /tmp/pip-ephem-wheel-cache-uqf51bjq/wheels/99/21/02/221df53baf03cd937166e2aa8f8dff3cd05f5c929f2b22b56e 2025-09-07T06:20:11.0071295Z Successfully built lumen-ci 2025-09-07T06:20:11.1500123Z Installing collected packages: urllib3, typing-extensions, smmap, charset-normalizer, certifi, tomli, requests, pluggy, packaging, iniconfig, gitdb, exceptiongroup, uv, pyyaml, pytest, GitPython, docker, lumen-ci 2025-09-07T06:20:11.5314408Z WARNING: The script normalizer is installed in '/home/ec2-user/.local/bin' which is not on PATH. 2025-09-07T06:20:11.5315423Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2025-09-07T06:20:13.1279243Z WARNING: The scripts py.test and pytest are installed in '/home/ec2-user/.local/bin' which is not on PATH. 2025-09-07T06:20:13.1280267Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2025-09-07T06:20:13.4235632Z ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. 2025-09-07T06:20:13.4237670Z Successfully installed GitPython-3.1.45 certifi-2025.8.3 charset-normalizer-3.4.3 docker-7.1.0 exceptiongroup-1.3.0 gitdb-4.0.12 iniconfig-2.1.0 lumen-ci-0.1.0 packaging-25.0 pluggy-1.6.0 pytest-7.3.2 pyyaml-6.0.2 requests-2.32.5 smmap-5.0.2 tomli-2.2.1 typing-extensions-4.15.0 urllib3-2.5.0 uv-0.8.6 2025-09-07T06:20:13.4239306Z awscli 2.25.0 requires urllib3<1.27,>=1.25.4, but you have urllib3 2.5.0 which is incompatible. 2025-09-07T06:20:13.5190376Z Building external package: vllm in directory /home/ec2-user/actions-runner/_work/_temp/artifacts/externals/vllm 2025-09-07T06:20:13.7390489Z 2025-09-07 06:20:13,738 [INFO] cli.lib.core.vllm.vllm_build: Running vllm build with inputs: VllmBuildParameters(use_torch_whl=True, torch_whls_path=PosixPath('/home/ec2-user/actions-runner/_work/_temp/artifacts'), use_local_base_image=True, base_image='pytorch/manylinux2_28-builder:cuda12.8', use_local_dockerfile=True, dockerfile_path=PosixPath('/home/ec2-user/actions-runner/_work/pytorch/pytorch/.github/ci_configs/vllm/Dockerfile.tmp_vllm'), output_dir=PosixPath('/home/ec2-user/actions-runner/_work/_temp/artifacts/externals/vllm'), target_stage='export-wheels', tag_name='vllm-wheels', cuda_version='12.8.1', python_version='3.12', max_jobs='42', sccache_bucket='ossci-compiler-cache-circleci-v2', sccache_region='us-east-1', torch_cuda_arch_list='8.0;8.9;9.0;10.0;12.0') 2025-09-07T06:20:13.7395070Z 2025-09-07 06:20:13,738 [INFO] cli.lib.common.git_helper: Cloning vllm to vllm 2025-09-07T06:20:13.9006555Z 2025-09-07 06:20:13,900 [INFO] cli.lib.common.git_helper: Progress: 20% - remote: Counting objects: 20% (13/62) 2025-09-07T06:20:13.9007469Z 2025-09-07 06:20:13,900 [INFO] cli.lib.common.git_helper: Progress: 25% - remote: Counting objects: 25% (16/62) 2025-09-07T06:20:13.9008327Z 2025-09-07 06:20:13,900 [INFO] cli.lib.common.git_helper: Progress: 30% - remote: Counting objects: 30% (19/62) 2025-09-07T06:20:13.9009173Z 2025-09-07 06:20:13,900 [INFO] cli.lib.common.git_helper: Progress: 35% - remote: Counting objects: 35% (22/62) 2025-09-07T06:20:13.9010024Z 2025-09-07 06:20:13,900 [INFO] cli.lib.common.git_helper: Progress: 40% - remote: Counting objects: 40% (25/62) 2025-09-07T06:20:13.9010860Z 2025-09-07 06:20:13,900 [INFO] cli.lib.common.git_helper: Progress: 45% - remote: Counting objects: 45% (28/62) 2025-09-07T06:20:13.9011682Z 2025-09-07 06:20:13,900 [INFO] cli.lib.common.git_helper: Progress: 50% - remote: Counting objects: 50% (31/62) 2025-09-07T06:20:13.9013173Z 2025-09-07 06:20:13,900 [INFO] cli.lib.common.git_helper: Progress: 70% - remote: Counting objects: 70% (44/62) 2025-09-07T06:20:13.9014080Z 2025-09-07 06:20:13,900 [INFO] cli.lib.common.git_helper: Progress: 75% - remote: Counting objects: 75% (47/62) 2025-09-07T06:20:13.9014924Z 2025-09-07 06:20:13,900 [INFO] cli.lib.common.git_helper: Progress: 80% - remote: Counting objects: 80% (50/62) 2025-09-07T06:20:13.9015787Z 2025-09-07 06:20:13,900 [INFO] cli.lib.common.git_helper: Progress: 85% - remote: Counting objects: 85% (53/62) 2025-09-07T06:20:13.9016639Z 2025-09-07 06:20:13,900 [INFO] cli.lib.common.git_helper: Progress: 90% - remote: Counting objects: 90% (56/62) 2025-09-07T06:20:13.9017501Z 2025-09-07 06:20:13,900 [INFO] cli.lib.common.git_helper: Progress: 95% - remote: Counting objects: 95% (59/62) 2025-09-07T06:20:13.9018370Z 2025-09-07 06:20:13,900 [INFO] cli.lib.common.git_helper: Progress: 100% - remote: Counting objects: 100% (62/62) 2025-09-07T06:20:13.9019360Z 2025-09-07 06:20:13,900 [INFO] cli.lib.common.git_helper: Progress: 10% - remote: Compressing objects: 10% (5/47) 2025-09-07T06:20:13.9033708Z 2025-09-07 06:20:13,903 [INFO] cli.lib.common.git_helper: Progress: 25% - remote: Compressing objects: 25% (12/47) 2025-09-07T06:20:13.9035014Z 2025-09-07 06:20:13,903 [INFO] cli.lib.common.git_helper: Progress: 40% - remote: Compressing objects: 40% (19/47) 2025-09-07T06:20:13.9037235Z 2025-09-07 06:20:13,903 [INFO] cli.lib.common.git_helper: Progress: 55% - remote: Compressing objects: 55% (26/47) 2025-09-07T06:20:13.9039796Z 2025-09-07 06:20:13,903 [INFO] cli.lib.common.git_helper: Progress: 65% - remote: Compressing objects: 65% (31/47) 2025-09-07T06:20:13.9040847Z 2025-09-07 06:20:13,904 [INFO] cli.lib.common.git_helper: Progress: 70% - remote: Compressing objects: 70% (33/47) 2025-09-07T06:20:13.9042213Z 2025-09-07 06:20:13,904 [INFO] cli.lib.common.git_helper: Progress: 80% - remote: Compressing objects: 80% (38/47) 2025-09-07T06:20:13.9043532Z 2025-09-07 06:20:13,904 [INFO] cli.lib.common.git_helper: Progress: 85% - remote: Compressing objects: 85% (40/47) 2025-09-07T06:20:13.9046242Z 2025-09-07 06:20:13,904 [INFO] cli.lib.common.git_helper: Progress: 95% - remote: Compressing objects: 95% (45/47) 2025-09-07T06:20:13.9047333Z 2025-09-07 06:20:13,904 [INFO] cli.lib.common.git_helper: Progress: 100% - remote: Compressing objects: 100% (47/47) 2025-09-07T06:20:13.9266893Z 2025-09-07 06:20:13,926 [INFO] cli.lib.common.git_helper: Progress: 0% - Receiving objects: 0% (1/110147) 2025-09-07T06:20:14.0284250Z 2025-09-07 06:20:14,028 [INFO] cli.lib.common.git_helper: Progress: 5% - Receiving objects: 5% (5508/110147) 2025-09-07T06:20:14.2371819Z 2025-09-07 06:20:14,236 [INFO] cli.lib.common.git_helper: Progress: 10% - Receiving objects: 10% (11015/110147) 2025-09-07T06:20:14.4722427Z 2025-09-07 06:20:14,471 [INFO] cli.lib.common.git_helper: Progress: 15% - Receiving objects: 15% (16523/110147), 20.37 MiB | 40.72 MiB/s 2025-09-07T06:20:14.6220026Z 2025-09-07 06:20:14,621 [INFO] cli.lib.common.git_helper: Progress: 20% - Receiving objects: 20% (22030/110147), 20.37 MiB | 40.72 MiB/s 2025-09-07T06:20:14.7508326Z 2025-09-07 06:20:14,750 [INFO] cli.lib.common.git_helper: Progress: 25% - Receiving objects: 25% (27537/110147), 20.37 MiB | 40.72 MiB/s 2025-09-07T06:20:14.8559538Z 2025-09-07 06:20:14,855 [INFO] cli.lib.common.git_helper: Progress: 30% - Receiving objects: 30% (33045/110147), 20.37 MiB | 40.72 MiB/s 2025-09-07T06:20:14.9662418Z 2025-09-07 06:20:14,966 [INFO] cli.lib.common.git_helper: Progress: 35% - Receiving objects: 35% (38552/110147), 41.97 MiB | 41.96 MiB/s 2025-09-07T06:20:15.0792504Z 2025-09-07 06:20:15,078 [INFO] cli.lib.common.git_helper: Progress: 40% - Receiving objects: 40% (44059/110147), 41.97 MiB | 41.96 MiB/s 2025-09-07T06:20:15.1720042Z 2025-09-07 06:20:15,171 [INFO] cli.lib.common.git_helper: Progress: 45% - Receiving objects: 45% (49567/110147), 41.97 MiB | 41.96 MiB/s 2025-09-07T06:20:15.2560938Z 2025-09-07 06:20:15,255 [INFO] cli.lib.common.git_helper: Progress: 50% - Receiving objects: 50% (55074/110147), 41.97 MiB | 41.96 MiB/s 2025-09-07T06:20:15.3559706Z 2025-09-07 06:20:15,355 [INFO] cli.lib.common.git_helper: Progress: 55% - Receiving objects: 55% (60581/110147), 41.97 MiB | 41.96 MiB/s 2025-09-07T06:20:15.4226790Z 2025-09-07 06:20:15,422 [INFO] cli.lib.common.git_helper: Progress: 60% - Receiving objects: 60% (66089/110147), 64.29 MiB | 42.85 MiB/s 2025-09-07T06:20:15.4799281Z 2025-09-07 06:20:15,479 [INFO] cli.lib.common.git_helper: Progress: 65% - Receiving objects: 65% (71596/110147), 64.29 MiB | 42.85 MiB/s 2025-09-07T06:20:15.5050367Z 2025-09-07 06:20:15,504 [INFO] cli.lib.common.git_helper: Progress: 70% - Receiving objects: 70% (77103/110147), 64.29 MiB | 42.85 MiB/s 2025-09-07T06:20:15.5365063Z 2025-09-07 06:20:15,536 [INFO] cli.lib.common.git_helper: Progress: 75% - Receiving objects: 75% (82611/110147), 64.29 MiB | 42.85 MiB/s 2025-09-07T06:20:15.5632994Z 2025-09-07 06:20:15,563 [INFO] cli.lib.common.git_helper: Progress: 80% - Receiving objects: 80% (88118/110147), 64.29 MiB | 42.85 MiB/s 2025-09-07T06:20:15.6055429Z 2025-09-07 06:20:15,605 [INFO] cli.lib.common.git_helper: Progress: 85% - Receiving objects: 85% (93625/110147), 64.29 MiB | 42.85 MiB/s 2025-09-07T06:20:15.6348406Z 2025-09-07 06:20:15,634 [INFO] cli.lib.common.git_helper: Progress: 90% - Receiving objects: 90% (99133/110147), 64.29 MiB | 42.85 MiB/s 2025-09-07T06:20:15.7009927Z 2025-09-07 06:20:15,700 [INFO] cli.lib.common.git_helper: Progress: 95% - Receiving objects: 95% (104640/110147), 64.29 MiB | 42.85 MiB/s 2025-09-07T06:20:15.7394385Z 2025-09-07 06:20:15,739 [INFO] cli.lib.common.git_helper: Progress: 100% - Receiving objects: 100% (110147/110147), 64.29 MiB | 42.85 MiB/s 2025-09-07T06:20:15.7654853Z 2025-09-07 06:20:15,765 [INFO] cli.lib.common.git_helper: Resolving deltas: 0% (0/87202) 2025-09-07T06:20:15.8158382Z 2025-09-07 06:20:15,815 [INFO] cli.lib.common.git_helper: Progress: 5% - Resolving deltas: 5% (4362/87202) 2025-09-07T06:20:15.8776075Z 2025-09-07 06:20:15,877 [INFO] cli.lib.common.git_helper: Progress: 10% - Resolving deltas: 10% (8725/87202) 2025-09-07T06:20:15.9435088Z 2025-09-07 06:20:15,943 [INFO] cli.lib.common.git_helper: Progress: 15% - Resolving deltas: 15% (13082/87202) 2025-09-07T06:20:16.0374281Z 2025-09-07 06:20:16,037 [INFO] cli.lib.common.git_helper: Progress: 20% - Resolving deltas: 20% (17443/87202) 2025-09-07T06:20:16.0872994Z 2025-09-07 06:20:16,087 [INFO] cli.lib.common.git_helper: Progress: 25% - Resolving deltas: 25% (21801/87202) 2025-09-07T06:20:16.1193142Z 2025-09-07 06:20:16,119 [INFO] cli.lib.common.git_helper: Progress: 30% - Resolving deltas: 30% (26161/87202) 2025-09-07T06:20:16.1582946Z 2025-09-07 06:20:16,158 [INFO] cli.lib.common.git_helper: Progress: 35% - Resolving deltas: 35% (30521/87202) 2025-09-07T06:20:16.2120313Z 2025-09-07 06:20:16,211 [INFO] cli.lib.common.git_helper: Progress: 40% - Resolving deltas: 40% (34883/87202) 2025-09-07T06:20:16.2600195Z 2025-09-07 06:20:16,259 [INFO] cli.lib.common.git_helper: Progress: 45% - Resolving deltas: 45% (39254/87202) 2025-09-07T06:20:16.2919928Z 2025-09-07 06:20:16,291 [INFO] cli.lib.common.git_helper: Progress: 50% - Resolving deltas: 50% (43601/87202) 2025-09-07T06:20:16.3206057Z 2025-09-07 06:20:16,320 [INFO] cli.lib.common.git_helper: Progress: 55% - Resolving deltas: 55% (47962/87202) 2025-09-07T06:20:16.3534677Z 2025-09-07 06:20:16,353 [INFO] cli.lib.common.git_helper: Progress: 60% - Resolving deltas: 60% (52322/87202) 2025-09-07T06:20:16.3863262Z 2025-09-07 06:20:16,386 [INFO] cli.lib.common.git_helper: Progress: 65% - Resolving deltas: 65% (56682/87202) 2025-09-07T06:20:16.4317810Z 2025-09-07 06:20:16,431 [INFO] cli.lib.common.git_helper: Progress: 70% - Resolving deltas: 70% (61042/87202) 2025-09-07T06:20:16.4598793Z 2025-09-07 06:20:16,459 [INFO] cli.lib.common.git_helper: Progress: 75% - Resolving deltas: 75% (65402/87202) 2025-09-07T06:20:16.4964099Z 2025-09-07 06:20:16,496 [INFO] cli.lib.common.git_helper: Progress: 80% - Resolving deltas: 80% (69763/87202) 2025-09-07T06:20:16.5230493Z 2025-09-07 06:20:16,522 [INFO] cli.lib.common.git_helper: Progress: 85% - Resolving deltas: 85% (74122/87202) 2025-09-07T06:20:16.5500700Z 2025-09-07 06:20:16,549 [INFO] cli.lib.common.git_helper: Progress: 90% - Resolving deltas: 90% (78482/87202) 2025-09-07T06:20:16.5752698Z 2025-09-07 06:20:16,575 [INFO] cli.lib.common.git_helper: Progress: 95% - Resolving deltas: 95% (82842/87202) 2025-09-07T06:20:16.5960019Z 2025-09-07 06:20:16,595 [INFO] cli.lib.common.git_helper: Progress: 100% - Resolving deltas: 100% (87202/87202) 2025-09-07T06:20:17.1590211Z 2025-09-07 06:20:17,158 [INFO] cli.lib.common.git_helper: Checking out pinned vllm commit 4172235ab78b09989fb56edaf734dbee283dda3e 2025-09-07T06:20:17.2053436Z 2025-09-07 06:20:17,204 [INFO] cli.lib.common.git_helper: Successfully cloned vllm 2025-09-07T06:20:18.8411258Z 2025-09-07 06:20:18,840 [INFO] cli.lib.core.vllm.vllm_build: Running docker build: 2025-09-07T06:20:18.8415136Z docker buildx build --output type=local,dest=/home/ec2-user/actions-runner/_work/_temp/artifacts/externals/vllm -f docker/Dockerfile.nightly_torch --pull=false --build-arg TORCH_WHEELS_PATH=tmp --build-arg BUILD_BASE_IMAGE=pytorch/manylinux2_28-builder:cuda12.8 --build-arg FINAL_BASE_IMAGE=pytorch/manylinux2_28-builder:cuda12.8 --build-arg max_jobs=42 --build-arg CUDA_VERSION=12.8.1 --build-arg PYTHON_VERSION=3.12 --build-arg USE_SCCACHE=1 --build-arg SCCACHE_BUCKET_NAME=ossci-compiler-cache-circleci-v2 --build-arg SCCACHE_REGION_NAME=us-east-1 --build-arg torch_cuda_arch_list='8.0;8.9;9.0;10.0;12.0' --target export-wheels -t vllm-wheels --progress=plain . 2025-09-07T06:20:18.8421990Z 2025-09-07 06:20:18,841 [INFO] cli.lib.common.utils: [cmd] docker buildx build --output type=local,dest=/home/ec2-user/actions-runner/_work/_temp/artifacts/externals/vllm -f docker/Dockerfile.nightly_torch --pull=false --build-arg TORCH_WHEELS_PATH=tmp --build-arg BUILD_BASE_IMAGE=pytorch/manylinux2_28-builder:cuda12.8 --build-arg FINAL_BASE_IMAGE=pytorch/manylinux2_28-builder:cuda12.8 --build-arg max_jobs=42 --build-arg CUDA_VERSION=12.8.1 --build-arg PYTHON_VERSION=3.12 --build-arg USE_SCCACHE=1 --build-arg SCCACHE_BUCKET_NAME=ossci-compiler-cache-circleci-v2 --build-arg SCCACHE_REGION_NAME=us-east-1 --build-arg torch_cuda_arch_list=8.0;8.9;9.0;10.0;12.0 --target export-wheels -t vllm-wheels --progress=plain . 2025-09-07T06:20:19.2361998Z #0 building with "default" instance using docker driver 2025-09-07T06:20:19.2362579Z 2025-09-07T06:20:19.2362852Z #1 [internal] load build definition from Dockerfile.nightly_torch 2025-09-07T06:20:19.2363332Z #1 transferring dockerfile: 18.57kB done 2025-09-07T06:20:19.2363684Z #1 DONE 0.0s 2025-09-07T06:20:19.2363823Z 2025-09-07T06:20:19.2364300Z #2 [internal] load metadata for docker.io/pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:20:19.2364904Z #2 DONE 0.0s 2025-09-07T06:20:19.2365134Z 2025-09-07T06:20:19.2365325Z #3 [internal] load .dockerignore 2025-09-07T06:20:19.2365669Z #3 transferring context: 442B done 2025-09-07T06:20:19.2366113Z #3 DONE 0.0s 2025-09-07T06:20:19.2366271Z 2025-09-07T06:20:19.2366385Z #4 [internal] load build context 2025-09-07T06:20:19.5365482Z #4 ... 2025-09-07T06:20:19.5365687Z 2025-09-07T06:20:19.5365971Z #5 [vllm-base 1/18] FROM docker.io/pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T06:20:19.5366492Z #5 DONE 0.3s 2025-09-07T06:20:19.5366636Z 2025-09-07T06:20:19.5366765Z #4 [internal] load build context 2025-09-07T06:20:24.1401385Z #4 transferring context: 1.14GB 5.0s 2025-09-07T06:20:29.1448189Z #4 transferring context: 2.32GB 10.0s 2025-09-07T06:20:29.4677158Z #4 ... 2025-09-07T06:20:29.4677376Z 2025-09-07T06:20:29.4677519Z #6 [vllm-base 2/18] WORKDIR /workspace 2025-09-07T06:20:29.5676573Z #6 ... 2025-09-07T06:20:29.5677026Z 2025-09-07T06:20:29.5678266Z #7 [base 2/20] RUN if command -v apt-get >/dev/null; then apt-get update -y && apt-get install -y ccache software-properties-common git curl wget sudo vim; else dnf install -y git curl wget sudo vim; fi && python3 --version && python3 -m pip --version 2025-09-07T06:20:29.5680018Z #7 1.598 Last metadata expiration check: 1 day, 4:15:10 ago on Sat 06 Sep 2025 02:05:11 AM UTC. 2025-09-07T06:20:29.5680744Z #7 2.386 Package git-2.43.7-1.el8_10.x86_64 is already installed. 2025-09-07T06:20:29.5681414Z #7 2.387 Package curl-7.61.1-34.el8_10.3.x86_64 is already installed. 2025-09-07T06:20:29.5682180Z #7 2.388 Package wget-1.19.5-12.el8_10.x86_64 is already installed. 2025-09-07T06:20:29.5682721Z #7 2.617 Dependencies resolved. 2025-09-07T06:20:29.5683150Z #7 2.619 ================================================================================ 2025-09-07T06:20:29.5683884Z #7 2.619 Package Arch Version Repository Size 2025-09-07T06:20:29.5684762Z #7 2.619 ================================================================================ 2025-09-07T06:20:29.5685159Z #7 2.619 Installing: 2025-09-07T06:20:29.5685535Z #7 2.619 sudo x86_64 1.9.5p2-1.el8_10.2 baseos 1.0 M 2025-09-07T06:20:29.5686153Z #7 2.619 vim-enhanced x86_64 2:8.0.1763-19.el8_6.4 appstream 1.4 M 2025-09-07T06:20:29.5686816Z #7 2.619 Installing dependencies: 2025-09-07T06:20:29.5687340Z #7 2.619 gpm-libs x86_64 1.20.7-17.el8 appstream 38 k 2025-09-07T06:20:29.5688028Z #7 2.619 vim-common x86_64 2:8.0.1763-19.el8_6.4 appstream 6.3 M 2025-09-07T06:20:29.5688866Z #7 2.619 vim-filesystem noarch 2:8.0.1763-19.el8_6.4 appstream 49 k 2025-09-07T06:20:29.5689342Z #7 2.619 2025-09-07T06:20:29.5689592Z #7 2.619 Transaction Summary 2025-09-07T06:20:29.5690063Z #7 2.619 ================================================================================ 2025-09-07T06:20:29.5690540Z #7 2.619 Install 5 Packages 2025-09-07T06:20:29.5690819Z #7 2.619 2025-09-07T06:20:29.5691071Z #7 2.619 Total download size: 8.8 M 2025-09-07T06:20:29.5691394Z #7 2.620 Installed size: 34 M 2025-09-07T06:20:29.5691712Z #7 2.620 Downloading Packages: 2025-09-07T06:20:29.5692576Z #7 2.788 (1/5): gpm-libs-1.20.7-17.el8.x86_64.rpm 399 kB/s | 38 kB 00:00 2025-09-07T06:20:29.5693206Z #7 2.822 (2/5): sudo-1.9.5p2-1.el8_10.2.x86_64.rpm 8.3 MB/s | 1.0 MB 00:00 2025-09-07T06:20:29.5694057Z #7 2.830 (3/5): vim-filesystem-8.0.1763-19.el8_6.4.noarc 4.9 MB/s | 49 kB 00:00 2025-09-07T06:20:29.5694712Z #7 2.865 (4/5): vim-enhanced-8.0.1763-19.el8_6.4.x86_64. 18 MB/s | 1.4 MB 00:00 2025-09-07T06:20:29.5695363Z #7 3.056 (5/5): vim-common-8.0.1763-19.el8_6.4.x86_64.rp 17 MB/s | 6.3 MB 00:00 2025-09-07T06:20:29.5695961Z #7 3.056 -------------------------------------------------------------------------------- 2025-09-07T06:20:29.5696490Z #7 3.056 Total 20 MB/s | 8.8 MB 00:00 2025-09-07T06:20:29.5696933Z #7 3.203 Running transaction check 2025-09-07T06:20:29.5697276Z #7 3.224 Transaction check succeeded. 2025-09-07T06:20:29.5697631Z #7 3.224 Running transaction test 2025-09-07T06:20:29.5697962Z #7 3.351 Transaction test succeeded. 2025-09-07T06:20:29.5698310Z #7 3.354 Running transaction 2025-09-07T06:20:29.5698699Z #7 4.166 Preparing : 1/1 2025-09-07T06:20:29.5699313Z #7 4.749 Installing : vim-filesystem-2:8.0.1763-19.el8_6.4.noarch 1/5 2025-09-07T06:20:29.5699977Z #7 5.795 Installing : vim-common-2:8.0.1763-19.el8_6.4.x86_64 2/5 2025-09-07T06:20:29.5700629Z #7 7.029 Installing : gpm-libs-1.20.7-17.el8.x86_64 3/5 2025-09-07T06:20:29.5701416Z #7 7.152 Running scriptlet: gpm-libs-1.20.7-17.el8.x86_64 3/5 2025-09-07T06:20:29.5702086Z #7 7.323 Installing : vim-enhanced-2:8.0.1763-19.el8_6.4.x86_64 4/5 2025-09-07T06:20:29.5702729Z #7 8.042 Installing : sudo-1.9.5p2-1.el8_10.2.x86_64 5/5 2025-09-07T06:20:29.5703373Z #7 8.763 Running scriptlet: sudo-1.9.5p2-1.el8_10.2.x86_64 5/5 2025-09-07T06:20:29.6677308Z #7 8.865 Running scriptlet: vim-common-2:8.0.1763-19.el8_6.4.x86_64 5/5 2025-09-07T06:20:29.6677918Z #7 ... 2025-09-07T06:20:34.1715889Z 2025-09-07T06:20:34.1716445Z #4 [internal] load build context 2025-09-07T06:20:34.1717000Z #4 transferring context: 3.50GB 15.0s 2025-09-07T06:20:37.2011572Z #4 transferring context: 4.21GB 18.1s done 2025-09-07T06:20:37.8680530Z #4 DONE 18.7s 2025-09-07T06:20:37.8680760Z 2025-09-07T06:20:37.8681941Z #7 [base 2/20] RUN if command -v apt-get >/dev/null; then apt-get update -y && apt-get install -y ccache software-properties-common git curl wget sudo vim; else dnf install -y git curl wget sudo vim; fi && python3 --version && python3 -m pip --version 2025-09-07T06:20:37.8686641Z #7 8.865 Running scriptlet: vim-common-2:8.0.1763-19.el8_6.4.x86_64 5/5 2025-09-07T06:20:37.8687306Z #7 11.48 Verifying : sudo-1.9.5p2-1.el8_10.2.x86_64 1/5 2025-09-07T06:20:37.8688032Z #7 11.48 Verifying : gpm-libs-1.20.7-17.el8.x86_64 2/5 2025-09-07T06:20:37.8688622Z #7 11.48 Verifying : vim-common-2:8.0.1763-19.el8_6.4.x86_64 3/5 2025-09-07T06:20:37.8689253Z #7 11.48 Verifying : vim-enhanced-2:8.0.1763-19.el8_6.4.x86_64 4/5 2025-09-07T06:20:37.8689858Z #7 11.48 Verifying : vim-filesystem-2:8.0.1763-19.el8_6.4.noarch 5/5 2025-09-07T06:20:37.8690350Z #7 11.74 2025-09-07T06:20:37.8690583Z #7 11.74 Installed: 2025-09-07T06:20:37.8690958Z #7 11.74 gpm-libs-1.20.7-17.el8.x86_64 2025-09-07T06:20:37.8691525Z #7 11.74 sudo-1.9.5p2-1.el8_10.2.x86_64 2025-09-07T06:20:37.8692339Z #7 11.74 vim-common-2:8.0.1763-19.el8_6.4.x86_64 2025-09-07T06:20:37.8693237Z #7 11.74 vim-enhanced-2:8.0.1763-19.el8_6.4.x86_64 2025-09-07T06:20:37.8693897Z #7 11.74 vim-filesystem-2:8.0.1763-19.el8_6.4.noarch 2025-09-07T06:20:37.8694385Z #7 11.74 2025-09-07T06:20:37.8694879Z #7 11.74 Complete! 2025-09-07T06:20:37.8695137Z #7 11.86 Python 3.12.11 2025-09-07T06:20:37.8695658Z #7 12.07 pip 25.2 from /opt/python/cp312-cp312/lib/python3.12/site-packages/pip (python 3.12) 2025-09-07T06:21:54.6973357Z #7 ... 2025-09-07T06:21:54.6973621Z 2025-09-07T06:21:54.6974143Z #6 [vllm-base 2/18] WORKDIR /workspace 2025-09-07T06:21:54.6976700Z #6 DONE 95.3s 2025-09-07T06:21:54.8566500Z 2025-09-07T06:21:54.8567954Z #7 [base 2/20] RUN if command -v apt-get >/dev/null; then apt-get update -y && apt-get install -y ccache software-properties-common git curl wget sudo vim; else dnf install -y git curl wget sudo vim; fi && python3 --version && python3 -m pip --version 2025-09-07T06:21:54.8569226Z #7 DONE 95.3s 2025-09-07T06:21:54.8569380Z 2025-09-07T06:21:54.8569681Z #8 [base 3/20] RUN ldconfig /usr/local/cuda-$(echo 12.8.1 | cut -d. -f1,2)/compat/ 2025-09-07T06:21:55.4461328Z #8 DONE 0.7s 2025-09-07T06:21:55.4461886Z 2025-09-07T06:21:55.4465073Z #9 [vllm-base 3/18] RUN if command -v apt-get >/dev/null; then apt-get update -y && apt-get install -y ccache software-properties-common git curl wget sudo vim && add-apt-repository -y ppa:deadsnakes/ppa && apt-get update -y && apt-get install -y python3.12 python3.12-dev python3.12-venv && update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1 && update-alternatives --set python3 /usr/bin/python3.12 && ln -sf /usr/bin/python3.12-config /usr/bin/python3-config && curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12; else dnf install -y git curl wget sudo vim; fi && python3 --version && python3 -m pip --version 2025-09-07T06:21:56.0838398Z #9 1.380 Last metadata expiration check: 1 day, 4:16:45 ago on Sat 06 Sep 2025 02:05:11 AM UTC. 2025-09-07T06:21:56.7403967Z #9 2.037 Package git-2.43.7-1.el8_10.x86_64 is already installed. 2025-09-07T06:21:56.8714542Z #9 2.038 Package curl-7.61.1-34.el8_10.3.x86_64 is already installed. 2025-09-07T06:21:56.8715186Z #9 2.039 Package wget-1.19.5-12.el8_10.x86_64 is already installed. 2025-09-07T06:21:56.8715646Z #9 2.097 Dependencies resolved. 2025-09-07T06:21:56.8716005Z #9 2.098 ================================================================================ 2025-09-07T06:21:56.8716509Z #9 2.098 Package Arch Version Repository Size 2025-09-07T06:21:56.8717021Z #9 2.098 ================================================================================ 2025-09-07T06:21:56.8717407Z #9 2.098 Installing: 2025-09-07T06:21:56.8717775Z #9 2.098 sudo x86_64 1.9.5p2-1.el8_10.2 baseos 1.0 M 2025-09-07T06:21:56.8718331Z #9 2.098 vim-enhanced x86_64 2:8.0.1763-19.el8_6.4 appstream 1.4 M 2025-09-07T06:21:56.8718818Z #9 2.098 Installing dependencies: 2025-09-07T06:21:56.8719254Z #9 2.098 gpm-libs x86_64 1.20.7-17.el8 appstream 38 k 2025-09-07T06:21:56.8719819Z #9 2.098 vim-common x86_64 2:8.0.1763-19.el8_6.4 appstream 6.3 M 2025-09-07T06:21:56.8720433Z #9 2.098 vim-filesystem noarch 2:8.0.1763-19.el8_6.4 appstream 49 k 2025-09-07T06:21:56.8720898Z #9 2.098 2025-09-07T06:21:56.8721137Z #9 2.098 Transaction Summary 2025-09-07T06:21:56.8721470Z #9 2.098 ================================================================================ 2025-09-07T06:21:56.8721862Z #9 2.098 Install 5 Packages 2025-09-07T06:21:56.8722143Z #9 2.098 2025-09-07T06:21:56.8722374Z #9 2.099 Total download size: 8.8 M 2025-09-07T06:21:56.8722708Z #9 2.099 Installed size: 34 M 2025-09-07T06:21:56.8723006Z #9 2.099 Downloading Packages: 2025-09-07T06:21:56.8723455Z #9 2.168 (1/5): gpm-libs-1.20.7-17.el8.x86_64.rpm 3.4 MB/s | 38 kB 00:00 2025-09-07T06:21:56.9968600Z #9 2.185 (2/5): sudo-1.9.5p2-1.el8_10.2.x86_64.rpm 37 MB/s | 1.0 MB 00:00 2025-09-07T06:21:56.9969334Z #9 2.193 (3/5): vim-filesystem-8.0.1763-19.el8_6.4.noarc 6.4 MB/s | 49 kB 00:00 2025-09-07T06:21:56.9970382Z #9 2.203 (4/5): vim-enhanced-8.0.1763-19.el8_6.4.x86_64. 39 MB/s | 1.4 MB 00:00 2025-09-07T06:21:56.9971007Z #9 2.249 (5/5): vim-common-8.0.1763-19.el8_6.4.x86_64.rp 69 MB/s | 6.3 MB 00:00 2025-09-07T06:21:56.9971593Z #9 2.249 -------------------------------------------------------------------------------- 2025-09-07T06:21:56.9972101Z #9 2.249 Total 59 MB/s | 8.8 MB 00:00 2025-09-07T06:21:57.2037279Z #9 2.368 Running transaction check 2025-09-07T06:21:57.2037730Z #9 2.388 Transaction check succeeded. 2025-09-07T06:21:57.2038071Z #9 2.388 Running transaction test 2025-09-07T06:21:57.2038405Z #9 2.501 Transaction test succeeded. 2025-09-07T06:21:57.3522107Z #9 2.503 Running transaction 2025-09-07T06:21:57.4776512Z 2025-09-07T06:21:57.4776941Z #9 ... 2025-09-07T06:21:57.4777143Z 2025-09-07T06:21:57.4777779Z #10 [base 4/20] RUN --mount=type=cache,target=/root/.cache/uv if ! python3 -m uv --version >/dev/null 2>&1; then python3 -m pip install uv==0.8.4; fi 2025-09-07T06:21:57.4778633Z #10 1.425 Collecting uv==0.8.4 2025-09-07T06:21:57.4779233Z #10 1.440 Downloading uv-0.8.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB) 2025-09-07T06:21:57.4780068Z #10 1.453 Downloading uv-0.8.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.8 MB) 2025-09-07T06:21:57.4781343Z #10 1.547 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.8/18.8 MB 218.0 MB/s 0:00:00 2025-09-07T06:21:57.4781812Z #10 1.612 Installing collected packages: uv 2025-09-07T06:21:57.4782210Z #10 1.903 Successfully installed uv-0.8.4 2025-09-07T06:21:57.4784080Z #10 1.903 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-09-07T06:21:57.4785985Z #10 DONE 2.0s 2025-09-07T06:21:57.4786138Z 2025-09-07T06:21:57.4788839Z #9 [vllm-base 3/18] RUN if command -v apt-get >/dev/null; then apt-get update -y && apt-get install -y ccache software-properties-common git curl wget sudo vim && add-apt-repository -y ppa:deadsnakes/ppa && apt-get update -y && apt-get install -y python3.12 python3.12-dev python3.12-venv && update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1 && update-alternatives --set python3 /usr/bin/python3.12 && ln -sf /usr/bin/python3.12-config /usr/bin/python3-config && curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12; else dnf install -y git curl wget sudo vim; fi && python3 --version && python3 -m pip --version 2025-09-07T06:21:57.5855242Z #9 2.682 Preparing : 1/1 2025-09-07T06:21:57.5855730Z #9 ... 2025-09-07T06:21:57.5855861Z 2025-09-07T06:21:57.5855979Z #11 [base 5/20] WORKDIR /workspace 2025-09-07T06:21:57.5856317Z #11 DONE 0.0s 2025-09-07T06:21:57.5856459Z 2025-09-07T06:21:57.5856684Z #12 [base 6/20] COPY requirements/common.txt requirements/common.txt 2025-09-07T06:21:57.5857130Z #12 DONE 0.0s 2025-09-07T06:21:57.5857271Z 2025-09-07T06:21:57.5860067Z #9 [vllm-base 3/18] RUN if command -v apt-get >/dev/null; then apt-get update -y && apt-get install -y ccache software-properties-common git curl wget sudo vim && add-apt-repository -y ppa:deadsnakes/ppa && apt-get update -y && apt-get install -y python3.12 python3.12-dev python3.12-venv && update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1 && update-alternatives --set python3 /usr/bin/python3.12 && ln -sf /usr/bin/python3.12-config /usr/bin/python3-config && curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12; else dnf install -y git curl wget sudo vim; fi && python3 --version && python3 -m pip --version 2025-09-07T06:21:57.5863361Z #9 2.682 Preparing : 1/1 2025-09-07T06:21:57.7735707Z #9 2.882 Installing : vim-filesystem-2:8.0.1763-19.el8_6.4.noarch 1/5 2025-09-07T06:21:57.7736247Z #9 ... 2025-09-07T06:21:57.7736382Z 2025-09-07T06:21:57.7736618Z #13 [base 7/20] COPY use_existing_torch.py use_existing_torch.py 2025-09-07T06:21:57.7737048Z #13 DONE 0.3s 2025-09-07T06:21:57.9722720Z 2025-09-07T06:21:57.9723289Z #14 [base 8/20] COPY pyproject.toml pyproject.toml 2025-09-07T06:21:57.9723737Z #14 DONE 0.0s 2025-09-07T06:21:57.9723886Z 2025-09-07T06:21:57.9724049Z #15 [base 9/20] RUN python3 use_existing_torch.py 2025-09-07T06:21:58.6019857Z #15 0.577 >>> cleaning requirements/common.txt 2025-09-07T06:21:58.6020364Z #15 0.577 <<< done cleaning requirements/common.txt 2025-09-07T06:21:58.6020756Z #15 0.577 2025-09-07T06:21:58.6021064Z #15 0.577 >>> cleaning pyproject.toml 2025-09-07T06:21:58.6021404Z #15 0.577 removed: 2025-09-07T06:21:58.6021654Z #15 0.577 "torch == 2.8.0", 2025-09-07T06:21:58.6021987Z #15 0.577 <<< done cleaning pyproject.toml 2025-09-07T06:21:58.6022320Z #15 0.577 2025-09-07T06:21:58.6022545Z #15 DONE 0.6s 2025-09-07T06:21:58.6022684Z 2025-09-07T06:21:58.6025867Z #9 [vllm-base 3/18] RUN if command -v apt-get >/dev/null; then apt-get update -y && apt-get install -y ccache software-properties-common git curl wget sudo vim && add-apt-repository -y ppa:deadsnakes/ppa && apt-get update -y && apt-get install -y python3.12 python3.12-dev python3.12-venv && update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1 && update-alternatives --set python3 /usr/bin/python3.12 && ln -sf /usr/bin/python3.12-config /usr/bin/python3-config && curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12; else dnf install -y git curl wget sudo vim; fi && python3 --version && python3 -m pip --version 2025-09-07T06:21:58.6028988Z #9 2.882 Installing : vim-filesystem-2:8.0.1763-19.el8_6.4.noarch 1/5 2025-09-07T06:21:58.6029624Z #9 3.670 Installing : vim-common-2:8.0.1763-19.el8_6.4.x86_64 2/5 2025-09-07T06:21:58.6030249Z #9 3.740 Installing : gpm-libs-1.20.7-17.el8.x86_64 3/5 2025-09-07T06:21:58.6030893Z #9 3.760 Running scriptlet: gpm-libs-1.20.7-17.el8.x86_64 3/5 2025-09-07T06:21:58.7372534Z #9 3.899 Installing : vim-enhanced-2:8.0.1763-19.el8_6.4.x86_64 4/5 2025-09-07T06:21:58.9640275Z #9 4.034 Installing : sudo-1.9.5p2-1.el8_10.2.x86_64 5/5 2025-09-07T06:21:58.9642051Z #9 4.056 Running scriptlet: sudo-1.9.5p2-1.el8_10.2.x86_64 5/5 2025-09-07T06:22:00.9646779Z #9 4.111 Running scriptlet: vim-common-2:8.0.1763-19.el8_6.4.x86_64 5/5 2025-09-07T06:22:00.9647507Z #9 6.261 Verifying : sudo-1.9.5p2-1.el8_10.2.x86_64 1/5 2025-09-07T06:22:00.9648126Z #9 6.261 Verifying : gpm-libs-1.20.7-17.el8.x86_64 2/5 2025-09-07T06:22:01.1151097Z #9 6.261 Verifying : vim-common-2:8.0.1763-19.el8_6.4.x86_64 3/5 2025-09-07T06:22:01.1151835Z #9 6.261 Verifying : vim-enhanced-2:8.0.1763-19.el8_6.4.x86_64 4/5 2025-09-07T06:22:01.1559298Z #9 6.261 Verifying : vim-filesystem-2:8.0.1763-19.el8_6.4.noarch 5/5 2025-09-07T06:22:01.1559814Z #9 6.453 2025-09-07T06:22:01.1560062Z #9 6.453 Installed: 2025-09-07T06:22:01.1560443Z #9 6.453 gpm-libs-1.20.7-17.el8.x86_64 2025-09-07T06:22:01.1561030Z #9 6.453 sudo-1.9.5p2-1.el8_10.2.x86_64 2025-09-07T06:22:01.1561618Z #9 6.453 vim-common-2:8.0.1763-19.el8_6.4.x86_64 2025-09-07T06:22:01.1562525Z #9 6.453 vim-enhanced-2:8.0.1763-19.el8_6.4.x86_64 2025-09-07T06:22:01.1563165Z #9 6.453 vim-filesystem-2:8.0.1763-19.el8_6.4.noarch 2025-09-07T06:22:01.1563632Z #9 6.453 2025-09-07T06:22:01.1563861Z #9 6.453 Complete! 2025-09-07T06:22:01.3799855Z #9 6.526 Python 3.12.11 2025-09-07T06:22:01.4245569Z #9 6.721 pip 25.2 from /opt/python/cp312-cp312/lib/python3.12/site-packages/pip (python 3.12) 2025-09-07T06:22:07.5230934Z #9 ... 2025-09-07T06:22:07.5231471Z 2025-09-07T06:22:07.5243411Z #16 [base 10/20] RUN --mount=type=bind,source=tmp,target=/dist --mount=type=cache,target=/root/.cache/uv if [ -n "tmp" ] && [ "tmp" != "./requirements" ] && [ -d "/dist" ] && ls /dist/torch*.whl >/dev/null 2>&1; then echo "[INFO] Installing torch wheels to build vllm"; torch_whl=$(find /dist -maxdepth 1 -name 'torch-*.whl' -print -quit); vision_whl=$(find /dist -name 'torchvision*.whl' | head -n1 | xargs); audio_whl=$(find /dist -name 'torchaudio*.whl' | head -n1 | xargs); uv pip install --system "${torch_whl}[opt-einsum]" "${vision_whl}" "${audio_whl}" /dist/*.whl; elif [ -n "$PINNED_TORCH_VERSION" ]; then echo "[INFO] Installing pinned torch nightly version to build vllm: $PINNED_TORCH_VERSION"; uv pip install --system "$PINNED_TORCH_VERSION" --index-url https://download.pytorch.org/whl/nightly/cu$(echo 12.8.1 | cut -d. -f1,2 | tr -d '.'); else echo "[INFO] Installing torch nightly with latest one to build vllm"; uv pip install --system torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu$(echo 12.8.1 | cut -d. -f1,2 | tr -d '.'); fi 2025-09-07T06:22:07.5248329Z #16 0.840 [INFO] Installing torch wheels to build vllm 2025-09-07T06:22:07.5248855Z #16 0.921 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T06:22:07.5249323Z #16 0.964 Resolved 31 packages in 39ms 2025-09-07T06:22:07.5249695Z #16 7.068 Prepared 31 packages in 6.10s 2025-09-07T06:22:07.5250047Z #16 7.184 Uninstalled 1 package in 115ms 2025-09-07T06:22:07.5250414Z #16 8.879 Installed 31 packages in 1.69s 2025-09-07T06:22:07.5250989Z #16 8.927 + filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) 2025-09-07T06:22:07.5251590Z #16 8.927 + fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) 2025-09-07T06:22:07.5252167Z #16 8.927 + jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) 2025-09-07T06:22:07.5253214Z #16 8.927 + markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T06:22:07.5254036Z #16 8.927 + mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) 2025-09-07T06:22:07.5254616Z #16 8.927 + networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) 2025-09-07T06:22:07.5255378Z #16 8.927 + numpy==2.3.2 (from file:///dist/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T06:22:07.5256346Z #16 8.927 + nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T06:22:07.5257440Z #16 8.927 + nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:22:07.5258666Z #16 8.927 + nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T06:22:07.5259929Z #16 8.928 + nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:22:07.5261089Z #16 8.928 + nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T06:22:07.5262078Z #16 8.928 + nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:22:07.5263297Z #16 8.928 + nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:22:07.5264306Z #16 8.928 + nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T06:22:07.5265269Z #16 8.928 + nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T06:22:07.5266315Z #16 8.928 + nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:22:07.5267379Z #16 8.928 + nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) 2025-09-07T06:22:07.5268370Z #16 8.928 + nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:22:07.5269451Z #16 8.928 + nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T06:22:07.5270561Z #16 8.928 + nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:22:07.5271674Z #16 8.928 + nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:22:07.5272357Z #16 8.928 + opt-einsum==3.4.0 2025-09-07T06:22:07.5272926Z #16 8.928 + pillow==11.3.0 (from file:///dist/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T06:22:07.5274017Z #16 8.928 + pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T06:22:07.5274834Z #16 8.928 - setuptools==80.9.0 2025-09-07T06:22:07.5275284Z #16 8.928 + setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) 2025-09-07T06:22:07.5275880Z #16 8.928 + sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) 2025-09-07T06:22:07.5276637Z #16 8.928 + torch==2.9.0.dev20250906+cu128 (from file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T06:22:07.5277701Z #16 8.928 + torchaudio==2.8.0.dev20250906+cu128 (from file:///dist/torchaudio-2.8.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T06:22:07.5278832Z #16 8.928 + torchvision==0.24.0.dev20250906+cu128 (from file:///dist/torchvision-0.24.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T06:22:07.5279769Z #16 8.928 + typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) 2025-09-07T06:23:48.6182503Z #16 DONE 110.2s 2025-09-07T06:23:48.6184436Z 2025-09-07T06:23:48.6187693Z #9 [vllm-base 3/18] RUN if command -v apt-get >/dev/null; then apt-get update -y && apt-get install -y ccache software-properties-common git curl wget sudo vim && add-apt-repository -y ppa:deadsnakes/ppa && apt-get update -y && apt-get install -y python3.12 python3.12-dev python3.12-venv && update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1 && update-alternatives --set python3 /usr/bin/python3.12 && ln -sf /usr/bin/python3.12-config /usr/bin/python3-config && curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12; else dnf install -y git curl wget sudo vim; fi && python3 --version && python3 -m pip --version 2025-09-07T06:23:48.7749284Z #9 DONE 113.9s 2025-09-07T06:23:48.7749782Z 2025-09-07T06:23:48.7750744Z #17 [base 11/20] RUN --mount=type=cache,target=/root/.cache/uv uv pip install --system numba==0.61.2 2025-09-07T06:23:49.1364600Z #17 0.512 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T06:23:49.2368913Z #17 0.613 Resolved 3 packages in 95ms 2025-09-07T06:23:49.4040229Z #17 0.615 Downloading numpy (15.8MiB) 2025-09-07T06:23:49.4040665Z #17 0.616 Downloading llvmlite (40.4MiB) 2025-09-07T06:23:49.4041029Z #17 0.629 Downloading numba (3.7MiB) 2025-09-07T06:23:49.6540690Z #17 1.030 Downloading llvmlite 2025-09-07T06:23:49.7977325Z #17 1.174 Downloading numba 2025-09-07T06:23:49.9167105Z #17 1.257 Downloading numpy 2025-09-07T06:23:49.9167557Z #17 1.257 Prepared 3 packages in 643ms 2025-09-07T06:23:49.9167954Z #17 1.293 Uninstalled 1 package in 35ms 2025-09-07T06:23:50.1639109Z #17 1.389 Installed 3 packages in 96ms 2025-09-07T06:23:50.1640011Z #17 1.389 + llvmlite==0.44.0 2025-09-07T06:23:50.1640402Z #17 1.389 + numba==0.61.2 2025-09-07T06:23:50.1641001Z #17 1.389 - numpy==2.3.2 (from file:///dist/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T06:23:50.1641642Z #17 1.389 + numpy==2.2.6 2025-09-07T06:23:52.3416454Z #17 DONE 3.7s 2025-09-07T06:23:52.4950535Z 2025-09-07T06:23:52.4952201Z #18 [base 12/20] RUN --mount=type=cache,target=/root/.cache/uv uv pip install --system -r requirements/common.txt 2025-09-07T06:23:52.8822953Z #18 0.538 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T06:23:53.4791239Z #18 1.135 Resolved 133 packages in 591ms 2025-09-07T06:23:53.6605419Z #18 1.148 Downloading pygments (1.2MiB) 2025-09-07T06:23:53.6605867Z #18 1.148 Downloading aiohttp (1.6MiB) 2025-09-07T06:23:53.6606511Z #18 1.149 Downloading pycountry (6.0MiB) 2025-09-07T06:23:53.6606935Z #18 1.149 Downloading pydantic-core (1.9MiB) 2025-09-07T06:23:53.6607315Z #18 1.150 Downloading hf-xet (3.0MiB) 2025-09-07T06:23:53.6607655Z #18 1.150 Downloading tokenizers (3.2MiB) 2025-09-07T06:23:53.6608018Z #18 1.160 Downloading scipy (33.5MiB) 2025-09-07T06:23:53.6608356Z #18 1.160 Downloading tiktoken (1.1MiB) 2025-09-07T06:23:53.6608721Z #18 1.161 Downloading soundfile (1.3MiB) 2025-09-07T06:23:53.6609121Z #18 1.162 Downloading opencv-python-headless (51.5MiB) 2025-09-07T06:23:53.6609536Z #18 1.163 Downloading uvloop (4.5MiB) 2025-09-07T06:23:53.6609885Z #18 1.163 Downloading xgrammar (7.5MiB) 2025-09-07T06:23:53.6610251Z #18 1.163 Downloading transformers (11.1MiB) 2025-09-07T06:23:53.6610641Z #18 1.164 Downloading openai-harmony (2.9MiB) 2025-09-07T06:23:53.6611018Z #18 1.165 Downloading outlines-core (2.2MiB) 2025-09-07T06:23:53.6611399Z #18 1.165 Downloading sentencepiece (1.3MiB) 2025-09-07T06:23:53.6611781Z #18 1.166 Downloading mistral-common (6.2MiB) 2025-09-07T06:23:53.6612162Z #18 1.166 Downloading llguidance (14.3MiB) 2025-09-07T06:23:53.6612514Z #18 1.166 Downloading triton (148.4MiB) 2025-09-07T06:23:53.9233175Z #18 1.579 Downloading tiktoken 2025-09-07T06:23:54.0722064Z #18 1.610 Downloading sentencepiece 2025-09-07T06:23:54.0722517Z #18 1.610 Downloading soundfile 2025-09-07T06:23:54.0722841Z #18 1.728 Downloading aiohttp 2025-09-07T06:23:54.0723178Z #18 1.728 Downloading pydantic-core 2025-09-07T06:23:54.2099228Z #18 1.763 Downloading outlines-core 2025-09-07T06:23:54.2099717Z #18 1.866 Downloading openai-harmony 2025-09-07T06:23:54.3404096Z #18 1.893 Downloading hf-xet 2025-09-07T06:23:54.3404519Z #18 1.901 Downloading tokenizers 2025-09-07T06:23:54.3404996Z #18 1.997 Downloading pygments 2025-09-07T06:23:54.5650406Z #18 2.071 Downloading uvloop 2025-09-07T06:23:54.6668358Z #18 2.323 Downloading xgrammar 2025-09-07T06:23:54.8223442Z #18 2.328 Downloading mistral-common 2025-09-07T06:23:54.8343037Z #18 2.491 Downloading pycountry 2025-09-07T06:23:55.0159968Z #18 2.521 Downloading llguidance 2025-09-07T06:23:55.7943256Z #18 3.450 Downloading opencv-python-headless 2025-09-07T06:23:56.7888986Z #18 4.445 Downloading triton 2025-09-07T06:23:56.9446046Z #18 4.450 Downloading scipy 2025-09-07T06:23:56.9641146Z #18 4.620 Downloading transformers 2025-09-07T06:23:57.1150700Z #18 4.620 Prepared 104 packages in 3.48s 2025-09-07T06:23:57.4734064Z #18 5.129 Installed 104 packages in 508ms 2025-09-07T06:23:57.6275992Z #18 5.130 + aiohappyeyeballs==2.6.1 2025-09-07T06:23:57.6276823Z #18 5.130 + aiohttp==3.12.15 2025-09-07T06:23:57.6277220Z #18 5.130 + aiosignal==1.4.0 2025-09-07T06:23:57.6277539Z #18 5.130 + annotated-types==0.7.0 2025-09-07T06:23:57.6277854Z #18 5.130 + anyio==4.10.0 2025-09-07T06:23:57.6278143Z #18 5.130 + astor==0.8.1 2025-09-07T06:23:57.6278407Z #18 5.130 + attrs==25.3.0 2025-09-07T06:23:57.6278684Z #18 5.130 + blake3==1.0.5 2025-09-07T06:23:57.6278957Z #18 5.130 + cachetools==6.2.0 2025-09-07T06:23:57.6279264Z #18 5.130 + cbor2==5.7.0 2025-09-07T06:23:57.6279534Z #18 5.130 + certifi==2025.8.3 2025-09-07T06:23:57.6279825Z #18 5.130 + cffi==1.17.1 2025-09-07T06:23:57.6280107Z #18 5.130 + charset-normalizer==3.4.3 2025-09-07T06:23:57.6280439Z #18 5.130 + click==8.2.1 2025-09-07T06:23:57.6280726Z #18 5.130 + cloudpickle==3.1.1 2025-09-07T06:23:57.6281037Z #18 5.130 + compressed-tensors==0.11.0 2025-09-07T06:23:57.6281375Z #18 5.130 + depyf==0.19.0 2025-09-07T06:23:57.6281643Z #18 5.130 + dill==0.4.0 2025-09-07T06:23:57.6281923Z #18 5.130 + diskcache==5.6.3 2025-09-07T06:23:57.6282215Z #18 5.130 + distro==1.9.0 2025-09-07T06:23:57.6282503Z #18 5.130 + dnspython==2.7.0 2025-09-07T06:23:57.6282787Z #18 5.130 + einops==0.8.1 2025-09-07T06:23:57.6283085Z #18 5.130 + email-validator==2.3.0 2025-09-07T06:23:57.6283398Z #18 5.130 + fastapi==0.116.1 2025-09-07T06:23:57.6283705Z #18 5.130 + fastapi-cli==0.0.10 2025-09-07T06:23:57.6284027Z #18 5.130 + fastapi-cloud-cli==0.1.5 2025-09-07T06:23:57.6284494Z #18 5.130 + frozendict==2.4.6 2025-09-07T06:23:57.6284813Z #18 5.131 + frozenlist==1.7.0 2025-09-07T06:23:57.6285094Z #18 5.131 + gguf==0.17.1 2025-09-07T06:23:57.6285372Z #18 5.131 + h11==0.16.0 2025-09-07T06:23:57.6285634Z #18 5.131 + hf-xet==1.1.9 2025-09-07T06:23:57.6285918Z #18 5.131 + httpcore==1.0.9 2025-09-07T06:23:57.6286204Z #18 5.131 + httptools==0.6.4 2025-09-07T06:23:57.6286501Z #18 5.131 + httpx==0.28.1 2025-09-07T06:23:57.6286782Z #18 5.131 + huggingface-hub==0.34.4 2025-09-07T06:23:57.6287102Z #18 5.131 + idna==3.10 2025-09-07T06:23:57.6287388Z #18 5.131 + interegular==0.3.3 2025-09-07T06:23:57.6287674Z #18 5.131 + jiter==0.10.0 2025-09-07T06:23:57.6287960Z #18 5.131 + jsonschema==4.25.1 2025-09-07T06:23:57.6288295Z #18 5.131 + jsonschema-specifications==2025.4.1 2025-09-07T06:23:57.6288667Z #18 5.131 + lark==1.2.2 2025-09-07T06:23:57.6288933Z #18 5.131 + llguidance==0.7.30 2025-09-07T06:23:57.6289250Z #18 5.131 + lm-format-enforcer==0.11.3 2025-09-07T06:23:57.6289593Z #18 5.131 + markdown-it-py==4.0.0 2025-09-07T06:23:57.6289910Z #18 5.131 + mdurl==0.1.2 2025-09-07T06:23:57.6290185Z #18 5.131 + mistral-common==1.8.4 2025-09-07T06:23:57.6290504Z #18 5.131 + msgspec==0.19.0 2025-09-07T06:23:57.6290799Z #18 5.131 + multidict==6.6.4 2025-09-07T06:23:57.6291081Z #18 5.131 + ninja==1.13.0 2025-09-07T06:23:57.6291375Z #18 5.131 + openai==1.106.1 2025-09-07T06:23:57.6291676Z #18 5.131 + openai-harmony==0.0.4 2025-09-07T06:23:57.6292267Z #18 5.131 + opencv-python-headless==4.12.0.88 2025-09-07T06:23:57.6292880Z #18 5.131 + outlines-core==0.2.10 2025-09-07T06:23:57.6293250Z #18 5.131 + partial-json-parser==0.2.1.1.post6 2025-09-07T06:23:57.6293654Z #18 5.131 + prometheus-client==0.22.1 2025-09-07T06:23:57.6294065Z #18 5.131 + prometheus-fastapi-instrumentator==7.1.0 2025-09-07T06:23:57.6294485Z #18 5.131 + propcache==0.3.2 2025-09-07T06:23:57.6294790Z #18 5.132 + protobuf==6.32.0 2025-09-07T06:23:57.6295103Z #18 5.132 + psutil==7.0.0 2025-09-07T06:23:57.6295415Z #18 5.132 + py-cpuinfo==9.0.0 2025-09-07T06:23:57.6295730Z #18 5.132 + pybase64==1.4.2 2025-09-07T06:23:57.6296041Z #18 5.132 + pycountry==24.6.1 2025-09-07T06:23:57.6296344Z #18 5.132 + pycparser==2.22 2025-09-07T06:23:57.6296655Z #18 5.132 + pydantic==2.11.7 2025-09-07T06:23:57.6296964Z #18 5.132 + pydantic-core==2.33.2 2025-09-07T06:23:57.6297319Z #18 5.132 + pydantic-extra-types==2.10.5 2025-09-07T06:23:57.6297669Z #18 5.132 + pygments==2.19.2 2025-09-07T06:23:57.6297990Z #18 5.132 + python-dotenv==1.1.1 2025-09-07T06:23:57.6298323Z #18 5.132 + python-json-logger==3.3.0 2025-09-07T06:23:57.6298823Z #18 5.132 + python-multipart==0.0.20 2025-09-07T06:23:57.6299158Z #18 5.132 + pyyaml==6.0.2 2025-09-07T06:23:57.6299456Z #18 5.132 + pyzmq==27.0.2 2025-09-07T06:23:57.6299756Z #18 5.132 + referencing==0.36.2 2025-09-07T06:23:57.6300062Z #18 5.132 + regex==2025.9.1 2025-09-07T06:23:57.6300364Z #18 5.132 + requests==2.32.5 2025-09-07T06:23:57.6300648Z #18 5.132 + rich==14.1.0 2025-09-07T06:23:57.6300942Z #18 5.132 + rich-toolkit==0.15.1 2025-09-07T06:23:57.6301248Z #18 5.132 + rignore==0.6.4 2025-09-07T06:23:57.6301551Z #18 5.132 + rpds-py==0.27.1 2025-09-07T06:23:57.6301847Z #18 5.132 + safetensors==0.6.2 2025-09-07T06:23:57.6302159Z #18 5.132 + scipy==1.16.1 2025-09-07T06:23:57.6302446Z #18 5.132 + sentencepiece==0.2.1 2025-09-07T06:23:57.6302776Z #18 5.132 + sentry-sdk==2.37.0 2025-09-07T06:23:57.6303093Z #18 5.132 + setproctitle==1.3.7 2025-09-07T06:23:57.6303403Z #18 5.132 + shellingham==1.5.4 2025-09-07T06:23:57.6303710Z #18 5.132 + six==1.17.0 2025-09-07T06:23:57.6303985Z #18 5.132 + sniffio==1.3.1 2025-09-07T06:23:57.6304282Z #18 5.133 + soundfile==0.13.1 2025-09-07T06:23:57.6304717Z #18 5.133 + soxr==0.5.0.post1 2025-09-07T06:23:57.6305015Z #18 5.133 + starlette==0.47.3 2025-09-07T06:23:57.6305297Z #18 5.133 + tiktoken==0.11.0 2025-09-07T06:23:57.6305594Z #18 5.133 + tokenizers==0.22.0 2025-09-07T06:23:57.6305881Z #18 5.133 + tqdm==4.67.1 2025-09-07T06:23:57.6306263Z #18 5.133 + transformers==4.56.1 2025-09-07T06:23:57.6306576Z #18 5.133 + triton==3.4.0 2025-09-07T06:23:57.6306845Z #18 5.133 + typer==0.17.4 2025-09-07T06:23:57.6307146Z #18 5.133 + typing-inspection==0.4.1 2025-09-07T06:23:57.6307466Z #18 5.133 + urllib3==2.5.0 2025-09-07T06:23:57.6307757Z #18 5.133 + uvicorn==0.35.0 2025-09-07T06:23:57.6308040Z #18 5.133 + uvloop==0.21.0 2025-09-07T06:23:57.6308334Z #18 5.133 + watchfiles==1.1.0 2025-09-07T06:23:57.6308626Z #18 5.133 + websockets==15.0.1 2025-09-07T06:23:57.6308934Z #18 5.133 + xgrammar==0.1.23 2025-09-07T06:23:57.6309219Z #18 5.133 + yarl==1.20.1 2025-09-07T06:24:14.8699415Z #18 DONE 22.5s 2025-09-07T06:24:15.0232762Z 2025-09-07T06:24:15.0233674Z #19 [base 13/20] RUN echo 7.5;8.0+PTX;9.0a 2025-09-07T06:24:15.3301296Z #19 0.458 7.5;8.0+PTX;9.0a 2025-09-07T06:24:15.4958130Z #19 DONE 0.5s 2025-09-07T06:24:15.4958602Z 2025-09-07T06:24:15.4958972Z #20 [base 14/20] RUN echo 42 2025-09-07T06:24:16.1197601Z #20 0.775 42 2025-09-07T06:24:16.2855449Z #20 DONE 0.8s 2025-09-07T06:24:16.2855684Z 2025-09-07T06:24:16.2855882Z #21 [base 15/20] RUN pip freeze | grep -E 'ninja' 2025-09-07T06:24:17.3093077Z #21 1.175 ninja==1.13.0 2025-09-07T06:24:17.4765630Z #21 DONE 1.2s 2025-09-07T06:24:17.4765857Z 2025-09-07T06:24:17.4767968Z #22 [base 16/20] RUN --mount=type=cache,target=/root/.cache/ccache --mount=type=cache,target=/root/.cache/uv echo 'git clone xformers...' && git clone https://github.com/facebookresearch/xformers.git --recursive && cd xformers && git checkout 5d4b92a5e5a9c6c6d4878283f47d82e17995b468 && git submodule update --init --recursive && echo 'finish git clone xformers...' && rm -rf build && python3 setup.py bdist_wheel --dist-dir=../xformers-dist --verbose && cd .. && rm -rf xformers 2025-09-07T06:24:18.1581576Z #22 0.833 git clone xformers... 2025-09-07T06:24:18.3118474Z #22 0.835 Cloning into 'xformers'... 2025-09-07T06:24:19.6057290Z #22 2.280 Submodule 'third_party/composable_kernel_tiled' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/composable_kernel_tiled' 2025-09-07T06:24:19.6058648Z #22 2.280 Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass' 2025-09-07T06:24:19.7617279Z #22 2.280 Submodule 'third_party/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'third_party/flash-attention' 2025-09-07T06:24:19.7619636Z #22 2.285 Cloning into '/workspace/xformers/third_party/composable_kernel_tiled'... 2025-09-07T06:24:23.0982804Z #22 5.773 Cloning into '/workspace/xformers/third_party/cutlass'... 2025-09-07T06:24:25.3844553Z #22 8.059 Cloning into '/workspace/xformers/third_party/flash-attention'... 2025-09-07T06:24:26.3128773Z #22 8.987 Submodule path 'third_party/composable_kernel_tiled': checked out '50fad035248b154cdfa4505cf5de7465ce146149' 2025-09-07T06:24:27.0134982Z #22 9.688 Submodule path 'third_party/cutlass': checked out 'e9627ce55b42fd2599f58cd4396da9380954def0' 2025-09-07T06:24:27.1152016Z #22 9.787 Submodule path 'third_party/flash-attention': checked out '3ba6f826b199ff68aa9e9139a46280160defa5cd' 2025-09-07T06:24:27.1153284Z #22 9.789 Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/flash-attention/csrc/composable_kernel' 2025-09-07T06:24:27.2703871Z #22 9.790 Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/flash-attention/csrc/cutlass' 2025-09-07T06:24:27.2706266Z #22 9.794 Cloning into '/workspace/xformers/third_party/flash-attention/csrc/composable_kernel'... 2025-09-07T06:24:30.5758950Z #22 13.25 Cloning into '/workspace/xformers/third_party/flash-attention/csrc/cutlass'... 2025-09-07T06:24:33.1989485Z #22 15.87 Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out 'd58f2b8bd0c2adad65a731403673d545d8483acb' 2025-09-07T06:24:33.9580638Z #22 16.63 Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'dc4817921edda44a549197ff3a9dcf5df0636e7b' 2025-09-07T06:24:35.4487503Z #22 18.12 Note: switching to '5d4b92a5e5a9c6c6d4878283f47d82e17995b468'. 2025-09-07T06:24:35.4488023Z #22 18.12 2025-09-07T06:24:35.4488428Z #22 18.12 You are in 'detached HEAD' state. You can look around, make experimental 2025-09-07T06:24:35.4489071Z #22 18.12 changes and commit them, and you can discard any commits you make in this 2025-09-07T06:24:35.4489723Z #22 18.12 state without impacting any branches by switching back to a branch. 2025-09-07T06:24:35.4490188Z #22 18.12 2025-09-07T06:24:35.4490611Z #22 18.12 If you want to create a new branch to retain commits you create, you may 2025-09-07T06:24:35.4491211Z #22 18.12 do so (now or later) by using -c with the switch command. Example: 2025-09-07T06:24:35.4491636Z #22 18.12 2025-09-07T06:24:35.4492098Z #22 18.12 git switch -c 2025-09-07T06:24:35.4492714Z #22 18.12 2025-09-07T06:24:35.4492981Z #22 18.12 Or undo this operation with: 2025-09-07T06:24:35.4493300Z #22 18.12 2025-09-07T06:24:35.4493559Z #22 18.12 git switch - 2025-09-07T06:24:35.4493825Z #22 18.12 2025-09-07T06:24:35.4494246Z #22 18.12 Turn off this advice by setting config variable advice.detachedHead to false 2025-09-07T06:24:35.4494745Z #22 18.12 2025-09-07T06:24:35.4495121Z #22 18.12 HEAD is now at 5d4b92a5 Update wheels matrix for ROCM + README update 2025-09-07T06:24:35.6715940Z #22 18.20 finish git clone xformers... 2025-09-07T06:24:39.5458703Z #22 22.22 W0907 06:24:39.544000 316 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/torch/utils/cpp_extension.py:117] No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' 2025-09-07T06:24:39.6561288Z #22 22.33 /opt/python/cp312-cp312/lib/python3.12/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated. 2025-09-07T06:24:39.6562212Z #22 22.33 !! 2025-09-07T06:24:39.6562455Z #22 22.33 2025-09-07T06:24:39.6562757Z #22 22.33 ******************************************************************************** 2025-09-07T06:24:39.6563406Z #22 22.33 Please consider removing the following classifiers in favor of a SPDX license expression: 2025-09-07T06:24:39.6563974Z #22 22.33 2025-09-07T06:24:39.6565690Z #22 22.33 License :: OSI Approved :: BSD License 2025-09-07T06:24:39.6566065Z #22 22.33 2025-09-07T06:24:39.6566612Z #22 22.33 See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details. 2025-09-07T06:24:39.6567298Z #22 22.33 ******************************************************************************** 2025-09-07T06:24:39.6567858Z #22 22.33 2025-09-07T06:24:39.6568076Z #22 22.33 !! 2025-09-07T06:24:39.6568348Z #22 22.33 self._finalize_license_expression() 2025-09-07T06:24:39.7695295Z #22 22.38 running bdist_wheel 2025-09-07T06:24:39.7695669Z #22 22.43 running build 2025-09-07T06:24:39.7695945Z #22 22.43 running build_py 2025-09-07T06:24:39.7696341Z #22 22.44 creating build/lib.linux-x86_64-cpython-312/xformers 2025-09-07T06:24:39.9860367Z #22 22.44 copying xformers/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers 2025-09-07T06:24:39.9861691Z #22 22.44 copying xformers/_cpp_lib.py -> build/lib.linux-x86_64-cpython-312/xformers 2025-09-07T06:24:39.9862993Z #22 22.44 copying xformers/_deprecation_warning.py -> build/lib.linux-x86_64-cpython-312/xformers 2025-09-07T06:24:39.9864537Z #22 22.44 copying xformers/attn_bias_utils.py -> build/lib.linux-x86_64-cpython-312/xformers 2025-09-07T06:24:39.9865923Z #22 22.45 copying xformers/checkpoint.py -> build/lib.linux-x86_64-cpython-312/xformers 2025-09-07T06:24:39.9867216Z #22 22.45 copying xformers/info.py -> build/lib.linux-x86_64-cpython-312/xformers 2025-09-07T06:24:39.9868490Z #22 22.45 copying xformers/test.py -> build/lib.linux-x86_64-cpython-312/xformers 2025-09-07T06:24:39.9869777Z #22 22.45 copying xformers/utils.py -> build/lib.linux-x86_64-cpython-312/xformers 2025-09-07T06:24:39.9871014Z #22 22.45 creating build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:24:39.9874514Z #22 22.45 copying xformers/benchmarks/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:24:39.9876027Z #22 22.45 copying xformers/benchmarks/benchmark_attn_decoding.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:24:39.9877511Z #22 22.45 copying xformers/benchmarks/benchmark_core.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:24:39.9878942Z #22 22.45 copying xformers/benchmarks/benchmark_indexing.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:24:39.9880691Z #22 22.45 copying xformers/benchmarks/benchmark_mem_eff_attention.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:24:39.9882563Z #22 22.45 copying xformers/benchmarks/benchmark_merge_attentions.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:24:39.9884614Z #22 22.45 copying xformers/benchmarks/benchmark_nystrom_utils.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:24:39.9886628Z #22 22.45 copying xformers/benchmarks/benchmark_revnet.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:24:39.9888581Z #22 22.45 copying xformers/benchmarks/benchmark_sddmm.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:24:39.9890730Z #22 22.45 copying xformers/benchmarks/benchmark_sequence_parallel_fused.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:24:39.9894628Z #22 22.45 copying xformers/benchmarks/benchmark_sp24.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:24:39.9896587Z #22 22.45 copying xformers/benchmarks/benchmark_swiglu.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:24:39.9898566Z #22 22.45 copying xformers/benchmarks/benchmark_tiled_matmul.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:24:39.9900448Z #22 22.45 copying xformers/benchmarks/utils.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks 2025-09-07T06:24:39.9901921Z #22 22.45 creating build/lib.linux-x86_64-cpython-312/xformers/components 2025-09-07T06:24:39.9903359Z #22 22.45 copying xformers/components/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/components 2025-09-07T06:24:39.9905185Z #22 22.45 copying xformers/components/input_projection.py -> build/lib.linux-x86_64-cpython-312/xformers/components 2025-09-07T06:24:39.9907068Z #22 22.45 copying xformers/components/residual.py -> build/lib.linux-x86_64-cpython-312/xformers/components 2025-09-07T06:24:39.9908567Z #22 22.45 creating build/lib.linux-x86_64-cpython-312/xformers/flash_attn_3 2025-09-07T06:24:39.9910408Z #22 22.45 copying xformers/flash_attn_3/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/flash_attn_3 2025-09-07T06:24:39.9911798Z #22 22.45 creating build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:24:39.9912989Z #22 22.45 copying xformers/ops/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:24:39.9914465Z #22 22.45 copying xformers/ops/common.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:24:39.9916189Z #22 22.45 copying xformers/ops/differentiable_collectives.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:24:39.9917908Z #22 22.45 copying xformers/ops/indexing.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:24:39.9919463Z #22 22.45 copying xformers/ops/modpar_layers.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:24:39.9921001Z #22 22.45 copying xformers/ops/rmsnorm.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:24:39.9922492Z #22 22.45 copying xformers/ops/rope_padded.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:24:39.9923972Z #22 22.45 copying xformers/ops/seqpar.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:24:39.9925625Z #22 22.45 copying xformers/ops/sequence_parallel_fused_ops.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:24:39.9927556Z #22 22.45 copying xformers/ops/sp24.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:24:39.9929037Z #22 22.45 copying xformers/ops/swiglu_op.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:24:39.9930578Z #22 22.45 copying xformers/ops/tiled_matmul.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:24:39.9932105Z #22 22.45 copying xformers/ops/tree_attention.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:24:39.9933665Z #22 22.45 copying xformers/ops/unbind.py -> build/lib.linux-x86_64-cpython-312/xformers/ops 2025-09-07T06:24:39.9934989Z #22 22.45 creating build/lib.linux-x86_64-cpython-312/xformers/profiler 2025-09-07T06:24:39.9936375Z #22 22.45 copying xformers/profiler/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/profiler 2025-09-07T06:24:39.9938015Z #22 22.45 copying xformers/profiler/api.py -> build/lib.linux-x86_64-cpython-312/xformers/profiler 2025-09-07T06:24:39.9939620Z #22 22.45 copying xformers/profiler/device_limits.py -> build/lib.linux-x86_64-cpython-312/xformers/profiler 2025-09-07T06:24:39.9941020Z #22 22.45 copying xformers/profiler/find_slowest.py -> build/lib.linux-x86_64-cpython-312/xformers/profiler 2025-09-07T06:24:39.9942415Z #22 22.46 copying xformers/profiler/profile_analyzer.py -> build/lib.linux-x86_64-cpython-312/xformers/profiler 2025-09-07T06:24:39.9943761Z #22 22.46 copying xformers/profiler/profiler.py -> build/lib.linux-x86_64-cpython-312/xformers/profiler 2025-09-07T06:24:39.9945096Z #22 22.46 copying xformers/profiler/profiler_dcgm.py -> build/lib.linux-x86_64-cpython-312/xformers/profiler 2025-09-07T06:24:39.9946456Z #22 22.46 copying xformers/profiler/profiler_dcgm_impl.py -> build/lib.linux-x86_64-cpython-312/xformers/profiler 2025-09-07T06:24:39.9947588Z #22 22.46 creating build/lib.linux-x86_64-cpython-312/xformers/sparse 2025-09-07T06:24:39.9948599Z #22 22.46 copying xformers/sparse/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/sparse 2025-09-07T06:24:39.9949818Z #22 22.46 copying xformers/sparse/_csr_ops.py -> build/lib.linux-x86_64-cpython-312/xformers/sparse 2025-09-07T06:24:39.9951153Z #22 22.46 copying xformers/sparse/blocksparse_tensor.py -> build/lib.linux-x86_64-cpython-312/xformers/sparse 2025-09-07T06:24:39.9952457Z #22 22.46 copying xformers/sparse/csr_tensor.py -> build/lib.linux-x86_64-cpython-312/xformers/sparse 2025-09-07T06:24:39.9953682Z #22 22.46 copying xformers/sparse/utils.py -> build/lib.linux-x86_64-cpython-312/xformers/sparse 2025-09-07T06:24:39.9954707Z #22 22.46 creating build/lib.linux-x86_64-cpython-312/xformers/triton 2025-09-07T06:24:39.9956113Z #22 22.46 copying xformers/triton/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/triton 2025-09-07T06:24:39.9957296Z #22 22.46 copying xformers/triton/importing.py -> build/lib.linux-x86_64-cpython-312/xformers/triton 2025-09-07T06:24:39.9958846Z #22 22.46 copying xformers/triton/vararg_kernel.py -> build/lib.linux-x86_64-cpython-312/xformers/triton 2025-09-07T06:24:39.9960126Z #22 22.46 creating build/lib.linux-x86_64-cpython-312/xformers/_flash_attn 2025-09-07T06:24:39.9961519Z #22 22.46 copying xformers/_flash_attn/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn 2025-09-07T06:24:39.9962926Z #22 22.46 copying xformers/_flash_attn/bert_padding.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn 2025-09-07T06:24:39.9964371Z #22 22.46 copying xformers/_flash_attn/flash_attn_interface.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn 2025-09-07T06:24:39.9965759Z #22 22.46 copying xformers/_flash_attn/flash_attn_triton.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn 2025-09-07T06:24:39.9967233Z #22 22.46 copying xformers/_flash_attn/flash_attn_triton_og.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn 2025-09-07T06:24:39.9968806Z #22 22.46 copying xformers/_flash_attn/flash_blocksparse_attention.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn 2025-09-07T06:24:39.9970830Z #22 22.46 copying xformers/_flash_attn/flash_blocksparse_attn_interface.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn 2025-09-07T06:24:39.9972372Z #22 22.46 copying xformers/_flash_attn/fused_softmax.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn 2025-09-07T06:24:39.9973583Z #22 22.46 creating build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA 2025-09-07T06:24:39.9974787Z #22 22.46 copying xformers/benchmarks/LRA/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA 2025-09-07T06:24:39.9976338Z #22 22.46 copying xformers/benchmarks/LRA/batch_fetch_results.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA 2025-09-07T06:24:39.9977991Z #22 22.46 copying xformers/benchmarks/LRA/batch_submit.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA 2025-09-07T06:24:39.9979612Z #22 22.46 copying xformers/benchmarks/LRA/run_grid_search.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA 2025-09-07T06:24:39.9981603Z #22 22.46 copying xformers/benchmarks/LRA/run_tasks.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA 2025-09-07T06:24:39.9983275Z #22 22.46 copying xformers/benchmarks/LRA/run_with_submitit.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA 2025-09-07T06:24:39.9984565Z #22 22.46 creating build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA/code 2025-09-07T06:24:39.9985869Z #22 22.46 copying xformers/benchmarks/LRA/code/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA/code 2025-09-07T06:24:39.9987560Z #22 22.46 copying xformers/benchmarks/LRA/code/dataset.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA/code 2025-09-07T06:24:39.9989235Z #22 22.46 copying xformers/benchmarks/LRA/code/model_wrapper.py -> build/lib.linux-x86_64-cpython-312/xformers/benchmarks/LRA/code 2025-09-07T06:24:39.9990626Z #22 22.46 creating build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:24:39.9992252Z #22 22.46 copying xformers/components/attention/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:24:39.9994089Z #22 22.46 copying xformers/components/attention/_sputnik_sparse.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:24:39.9996062Z #22 22.46 copying xformers/components/attention/attention_mask.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:24:39.9997876Z #22 22.46 copying xformers/components/attention/attention_patterns.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:24:39.9999544Z #22 22.46 copying xformers/components/attention/base.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:24:40.0001572Z #22 22.46 copying xformers/components/attention/core.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:24:40.0003238Z #22 22.46 copying xformers/components/attention/fourier_mix.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:24:40.0004929Z #22 22.46 copying xformers/components/attention/scaled_dot_product.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:24:40.0006735Z #22 22.46 copying xformers/components/attention/sparsity_config.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:24:40.0008425Z #22 22.46 copying xformers/components/attention/utils.py -> build/lib.linux-x86_64-cpython-312/xformers/components/attention 2025-09-07T06:24:40.0009726Z #22 22.47 creating build/lib.linux-x86_64-cpython-312/xformers/ops/_triton 2025-09-07T06:24:40.0010861Z #22 22.47 copying xformers/ops/_triton/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/_triton 2025-09-07T06:24:40.0012245Z #22 22.47 copying xformers/ops/_triton/k_index_select_cat.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/_triton 2025-09-07T06:24:40.0013807Z #22 22.47 copying xformers/ops/_triton/k_scaled_index_add.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/_triton 2025-09-07T06:24:40.0015666Z #22 22.47 copying xformers/ops/_triton/matmul_perf_model.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/_triton 2025-09-07T06:24:40.0017112Z #22 22.47 copying xformers/ops/_triton/rmsnorm_kernels.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/_triton 2025-09-07T06:24:40.0018594Z #22 22.47 copying xformers/ops/_triton/rope_padded_kernels.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/_triton 2025-09-07T06:24:40.0020140Z #22 22.47 copying xformers/ops/_triton/tiled_matmul_kernels.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/_triton 2025-09-07T06:24:40.0021364Z #22 22.47 creating build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:24:40.0022411Z #22 22.47 copying xformers/ops/fmha/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:24:40.0023635Z #22 22.47 copying xformers/ops/fmha/attn_bias.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:24:40.0024834Z #22 22.47 copying xformers/ops/fmha/ck.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:24:40.0026014Z #22 22.47 copying xformers/ops/fmha/ck_splitk.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:24:40.0027234Z #22 22.47 copying xformers/ops/fmha/common.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:24:40.0028461Z #22 22.47 copying xformers/ops/fmha/cutlass.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:24:40.0029712Z #22 22.47 copying xformers/ops/fmha/dispatch.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:24:40.0031000Z #22 22.47 copying xformers/ops/fmha/flash.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:24:40.0032255Z #22 22.47 copying xformers/ops/fmha/flash3.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:24:40.0033634Z #22 22.47 copying xformers/ops/fmha/merge_training.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:24:40.0035080Z #22 22.47 copying xformers/ops/fmha/torch_attention_compat.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:24:40.0036458Z #22 22.47 copying xformers/ops/fmha/triton_splitk.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha 2025-09-07T06:24:40.0037613Z #22 22.47 creating build/lib.linux-x86_64-cpython-312/xformers/ops/fmha/_triton 2025-09-07T06:24:40.0038775Z #22 22.47 copying xformers/ops/fmha/_triton/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha/_triton 2025-09-07T06:24:40.0040276Z #22 22.47 copying xformers/ops/fmha/_triton/splitk_kernels.py -> build/lib.linux-x86_64-cpython-312/xformers/ops/fmha/_triton 2025-09-07T06:24:40.0041771Z #22 22.47 creating build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:24:40.0043305Z #22 22.47 copying xformers/_flash_attn/flash_attn_triton_amd/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:24:40.0045176Z #22 22.47 copying xformers/_flash_attn/flash_attn_triton_amd/bench.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:24:40.0047066Z #22 22.47 copying xformers/_flash_attn/flash_attn_triton_amd/bwd_prefill.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:24:40.0049037Z #22 22.47 copying xformers/_flash_attn/flash_attn_triton_amd/bwd_prefill_fused.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:24:40.0051497Z #22 22.47 copying xformers/_flash_attn/flash_attn_triton_amd/bwd_prefill_onekernel.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:24:40.0053674Z #22 22.47 copying xformers/_flash_attn/flash_attn_triton_amd/bwd_prefill_split.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:24:40.0055746Z #22 22.47 copying xformers/_flash_attn/flash_attn_triton_amd/bwd_ref.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:24:40.0057548Z #22 22.47 copying xformers/_flash_attn/flash_attn_triton_amd/fp8.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:24:40.0059399Z #22 22.47 copying xformers/_flash_attn/flash_attn_triton_amd/fwd_decode.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:24:40.0061317Z #22 22.47 copying xformers/_flash_attn/flash_attn_triton_amd/fwd_prefill.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:24:40.0063127Z #22 22.47 copying xformers/_flash_attn/flash_attn_triton_amd/fwd_ref.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:24:40.0065024Z #22 22.47 copying xformers/_flash_attn/flash_attn_triton_amd/interface_fa.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:24:40.0067142Z #22 22.47 copying xformers/_flash_attn/flash_attn_triton_amd/test.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:24:40.0069604Z #22 22.47 copying xformers/_flash_attn/flash_attn_triton_amd/train.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:24:40.0071993Z #22 22.47 copying xformers/_flash_attn/flash_attn_triton_amd/utils.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/flash_attn_triton_amd 2025-09-07T06:24:40.0073905Z #22 22.47 creating build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/layers 2025-09-07T06:24:40.0075555Z #22 22.47 copying xformers/_flash_attn/layers/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/layers 2025-09-07T06:24:40.0077615Z #22 22.47 copying xformers/_flash_attn/layers/patch_embed.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/layers 2025-09-07T06:24:40.0079664Z #22 22.48 copying xformers/_flash_attn/layers/rotary.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/layers 2025-09-07T06:24:40.0081299Z #22 22.48 creating build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/losses 2025-09-07T06:24:40.0082927Z #22 22.48 copying xformers/_flash_attn/losses/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/losses 2025-09-07T06:24:40.0084995Z #22 22.48 copying xformers/_flash_attn/losses/cross_entropy.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/losses 2025-09-07T06:24:40.0086742Z #22 22.48 creating build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:24:40.0088627Z #22 22.48 copying xformers/_flash_attn/models/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:24:40.0090653Z #22 22.48 copying xformers/_flash_attn/models/baichuan.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:24:40.0197925Z #22 22.48 copying xformers/_flash_attn/models/bert.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:24:40.0199557Z #22 22.48 copying xformers/_flash_attn/models/bigcode.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:24:40.0201055Z #22 22.48 copying xformers/_flash_attn/models/btlm.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:24:40.0202476Z #22 22.48 copying xformers/_flash_attn/models/falcon.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:24:40.0203980Z #22 22.48 copying xformers/_flash_attn/models/gpt.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:24:40.0205858Z #22 22.48 copying xformers/_flash_attn/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:24:40.0207792Z #22 22.48 copying xformers/_flash_attn/models/gptj.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:24:40.0209905Z #22 22.48 copying xformers/_flash_attn/models/llama.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:24:40.0211316Z #22 22.48 copying xformers/_flash_attn/models/opt.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:24:40.0212827Z #22 22.48 copying xformers/_flash_attn/models/vit.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/models 2025-09-07T06:24:40.0214038Z #22 22.48 creating build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/modules 2025-09-07T06:24:40.0215275Z #22 22.48 copying xformers/_flash_attn/modules/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/modules 2025-09-07T06:24:40.0216869Z #22 22.48 copying xformers/_flash_attn/modules/block.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/modules 2025-09-07T06:24:40.0218505Z #22 22.48 copying xformers/_flash_attn/modules/embedding.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/modules 2025-09-07T06:24:40.0220065Z #22 22.48 copying xformers/_flash_attn/modules/mha.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/modules 2025-09-07T06:24:40.0221601Z #22 22.48 copying xformers/_flash_attn/modules/mlp.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/modules 2025-09-07T06:24:40.0222794Z #22 22.48 creating build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops 2025-09-07T06:24:40.0223994Z #22 22.48 copying xformers/_flash_attn/ops/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops 2025-09-07T06:24:40.0225570Z #22 22.48 copying xformers/_flash_attn/ops/activations.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops 2025-09-07T06:24:40.0227087Z #22 22.48 copying xformers/_flash_attn/ops/fused_dense.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops 2025-09-07T06:24:40.0246125Z #22 22.48 copying xformers/_flash_attn/ops/layer_norm.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops 2025-09-07T06:24:40.0247906Z #22 22.48 copying xformers/_flash_attn/ops/rms_norm.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops 2025-09-07T06:24:40.0249121Z #22 22.48 creating build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/utils 2025-09-07T06:24:40.0250318Z #22 22.48 copying xformers/_flash_attn/utils/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/utils 2025-09-07T06:24:40.0251804Z #22 22.48 copying xformers/_flash_attn/utils/benchmark.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/utils 2025-09-07T06:24:40.0253460Z #22 22.48 copying xformers/_flash_attn/utils/distributed.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/utils 2025-09-07T06:24:40.0255330Z #22 22.48 copying xformers/_flash_attn/utils/generation.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/utils 2025-09-07T06:24:40.0256787Z #22 22.48 copying xformers/_flash_attn/utils/library.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/utils 2025-09-07T06:24:40.0258254Z #22 22.48 copying xformers/_flash_attn/utils/pretrained.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/utils 2025-09-07T06:24:40.0259754Z #22 22.48 copying xformers/_flash_attn/utils/testing.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/utils 2025-09-07T06:24:40.0261266Z #22 22.48 copying xformers/_flash_attn/utils/torch.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/utils 2025-09-07T06:24:40.0262502Z #22 22.48 creating build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops/triton 2025-09-07T06:24:40.0263827Z #22 22.48 copying xformers/_flash_attn/ops/triton/__init__.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops/triton 2025-09-07T06:24:40.0265478Z #22 22.48 copying xformers/_flash_attn/ops/triton/cross_entropy.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops/triton 2025-09-07T06:24:40.0267155Z #22 22.48 copying xformers/_flash_attn/ops/triton/k_activations.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops/triton 2025-09-07T06:24:40.0268987Z #22 22.48 copying xformers/_flash_attn/ops/triton/layer_norm.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops/triton 2025-09-07T06:24:40.0270612Z #22 22.48 copying xformers/_flash_attn/ops/triton/linear.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops/triton 2025-09-07T06:24:40.0272203Z #22 22.48 copying xformers/_flash_attn/ops/triton/mlp.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops/triton 2025-09-07T06:24:40.0273802Z #22 22.48 copying xformers/_flash_attn/ops/triton/rotary.py -> build/lib.linux-x86_64-cpython-312/xformers/_flash_attn/ops/triton 2025-09-07T06:24:40.0274938Z #22 22.49 running build_ext 2025-09-07T06:24:40.0276333Z #22 22.50 W0907 06:24:39.822000 316 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/torch/utils/cpp_extension.py:531] There are no g++ version bounds defined for CUDA version 12.8 2025-09-07T06:24:40.0277841Z #22 22.50 building 'xformers.flash_attn_3._C' extension 2025-09-07T06:24:40.0279164Z #22 22.51 creating /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper 2025-09-07T06:24:40.0281175Z #22 22.51 creating /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations 2025-09-07T06:25:25.4983616Z #22 68.17 [1/154] c++ -MMD -MF /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/flash_api.o.d -pthread -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O3 -Wall -fPIC -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/flash_api.cpp -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/flash_api.o -O3 -std=c++17 -DPy_LIMITED_API=0x03090000 -fopenmp -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:26:24.7488343Z #22 127.4 [2/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/flash_prepare_scheduler.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/flash_prepare_scheduler.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/flash_prepare_scheduler.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:26:24.7594519Z #22 127.4 ptxas info : 10 bytes gmem 2025-09-07T06:26:24.7596397Z #22 127.4 ptxas info : Compiling entry function '_ZN5flash32prepare_varlen_num_blocks_kernelEiiiPKiS1_S1_S1_S1_S1_iiiiiN7cutlass10FastDivmodES3_PiS4_b' for 'sm_90a' 2025-09-07T06:26:24.7694657Z #22 127.4 ptxas info : Function properties for _ZN5flash32prepare_varlen_num_blocks_kernelEiiiPKiS1_S1_S1_S1_S1_iiiiiN7cutlass10FastDivmodES3_PiS4_b 2025-09-07T06:26:24.7696311Z #22 127.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:26:24.7697308Z #22 127.4 ptxas info : Used 13 registers, used 1 barriers, 4 bytes smem 2025-09-07T06:26:24.7698121Z #22 127.4 ptxas info : Compile time = 58.904 ms 2025-09-07T06:26:24.7698820Z #22 127.4 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:26:24.7700390Z #22 127.4 ptxas info : Compiling entry function '_ZN5flash32prepare_varlen_num_blocks_kernelEiiiPKiS1_S1_S1_S1_S1_iiiiiN7cutlass10FastDivmodES3_PiS4_b' for 'sm_80' 2025-09-07T06:26:24.7702761Z #22 127.4 ptxas info : Function properties for _ZN5flash32prepare_varlen_num_blocks_kernelEiiiPKiS1_S1_S1_S1_S1_iiiiiN7cutlass10FastDivmodES3_PiS4_b 2025-09-07T06:26:24.7704393Z #22 127.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:26:24.7705487Z #22 127.4 ptxas info : Used 14 registers, used 1 barriers, 4 bytes smem, 481 bytes cmem[0] 2025-09-07T06:26:24.7706458Z #22 127.4 ptxas info : Compile time = 29.191 ms 2025-09-07T06:28:18.2080295Z #22 240.9 [3/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_fp16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_fp16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_fp16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:28:18.3697581Z #22 240.9 ptxas info : 130 bytes gmem, 104 bytes cmem[4] 2025-09-07T06:28:18.3702770Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.3711211Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.3715755Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.3716785Z #22 240.9 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:18.3717701Z #22 240.9 ptxas info : Compile time = 1.954 ms 2025-09-07T06:28:18.3722120Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.3729886Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.3734743Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.3735837Z #22 240.9 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:18.3736790Z #22 240.9 ptxas info : Compile time = 0.904 ms 2025-09-07T06:28:18.3741459Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.3750266Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.3754903Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.3755952Z #22 240.9 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:18.3756900Z #22 240.9 ptxas info : Compile time = 0.631 ms 2025-09-07T06:28:18.3761583Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.3770223Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.3774989Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.3776081Z #22 240.9 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:18.3776969Z #22 240.9 ptxas info : Compile time = 0.585 ms 2025-09-07T06:28:18.3781635Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.3790002Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.3796376Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.3797212Z #22 240.9 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:18.3797927Z #22 240.9 ptxas info : Compile time = 0.618 ms 2025-09-07T06:28:18.3801567Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.3808165Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.3812089Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.3813040Z #22 240.9 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:18.3813730Z #22 240.9 ptxas info : Compile time = 0.556 ms 2025-09-07T06:28:18.3817480Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.3824114Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.3828006Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.3828820Z #22 240.9 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:18.3829529Z #22 240.9 ptxas info : Compile time = 0.564 ms 2025-09-07T06:28:18.3833110Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.3839665Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.3843325Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.3844149Z #22 240.9 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:18.3844852Z #22 240.9 ptxas info : Compile time = 0.553 ms 2025-09-07T06:28:18.3848438Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.3855014Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.3858525Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.3859352Z #22 240.9 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:18.3860274Z #22 240.9 ptxas info : Compile time = 0.562 ms 2025-09-07T06:28:18.3863779Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.3870161Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.3873717Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.3874537Z #22 240.9 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:18.3875251Z #22 240.9 ptxas info : Compile time = 0.634 ms 2025-09-07T06:28:18.3877211Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.3880100Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:18.3881892Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.3882707Z #22 240.9 ptxas info : Used 62 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:28:18.3883414Z #22 240.9 ptxas info : Compile time = 64.234 ms 2025-09-07T06:28:18.3887079Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.3894222Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.3897919Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.3898740Z #22 240.9 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:18.3899438Z #22 240.9 ptxas info : Compile time = 0.923 ms 2025-09-07T06:28:18.3902883Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.3909080Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:18.3912812Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.3913636Z #22 240.9 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:28:18.3914363Z #22 240.9 ptxas info : Compile time = 34.034 ms 2025-09-07T06:28:18.3917851Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.3924240Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:18.3927693Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.3928493Z #22 240.9 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:28:18.3929199Z #22 240.9 ptxas info : Compile time = 34.002 ms 2025-09-07T06:28:18.3932976Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.3939762Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.3943453Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.3944267Z #22 240.9 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:18.3944963Z #22 240.9 ptxas info : Compile time = 0.850 ms 2025-09-07T06:28:18.3946763Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.3949687Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:18.3951510Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.3952320Z #22 240.9 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:28:18.3953026Z #22 240.9 ptxas info : Compile time = 76.749 ms 2025-09-07T06:28:18.3953559Z #22 240.9 ptxas info : 10 bytes gmem 2025-09-07T06:28:18.3957164Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.3963964Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.3967567Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.3968310Z #22 240.9 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:18.3968920Z #22 240.9 ptxas info : Compile time = 287.643 ms 2025-09-07T06:28:18.3972936Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.3979510Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.3983110Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.3983822Z #22 240.9 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:18.3984437Z #22 240.9 ptxas info : Compile time = 241.639 ms 2025-09-07T06:28:18.3988109Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.3998150Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.4001848Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.4002590Z #22 240.9 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:18.4003207Z #22 240.9 ptxas info : Compile time = 293.059 ms 2025-09-07T06:28:18.4006824Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.4013687Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.4018225Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.4019104Z #22 240.9 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:18.4019878Z #22 240.9 ptxas info : Compile time = 263.936 ms 2025-09-07T06:28:18.4024163Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.4032338Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.4036709Z #22 240.9 48 bytes stack frame, 64 bytes spill stores, 116 bytes spill loads 2025-09-07T06:28:18.4037795Z #22 240.9 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:28:18.4038720Z #22 240.9 ptxas info : Compile time = 693.979 ms 2025-09-07T06:28:18.4043073Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.4050930Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.4055411Z #22 240.9 48 bytes stack frame, 64 bytes spill stores, 116 bytes spill loads 2025-09-07T06:28:18.4056491Z #22 240.9 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:28:18.4057428Z #22 240.9 ptxas info : Compile time = 689.040 ms 2025-09-07T06:28:18.4061824Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.4069662Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.4074254Z #22 240.9 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:28:18.4075300Z #22 240.9 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:28:18.4076206Z #22 240.9 ptxas info : Compile time = 743.188 ms 2025-09-07T06:28:18.4080521Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.4088424Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.4093397Z #22 240.9 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:28:18.4094474Z #22 240.9 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:28:18.4095416Z #22 240.9 ptxas info : Compile time = 708.860 ms 2025-09-07T06:28:18.4099637Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.4107219Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.4111382Z #22 240.9 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:28:18.4112446Z #22 240.9 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:28:18.4113364Z #22 240.9 ptxas info : Compile time = 461.344 ms 2025-09-07T06:28:18.4117506Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.4125066Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.4129252Z #22 240.9 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:28:18.4130313Z #22 240.9 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:28:18.4131516Z #22 240.9 ptxas info : Compile time = 447.721 ms 2025-09-07T06:28:18.4133780Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.4137198Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:18.4139330Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.4140195Z #22 240.9 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:28:18.4140891Z #22 240.9 ptxas info : Compile time = 21.870 ms 2025-09-07T06:28:18.4145235Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.4153433Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.4157876Z #22 240.9 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:28:18.4158944Z #22 240.9 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:28:18.4159897Z #22 240.9 ptxas info : Compile time = 492.581 ms 2025-09-07T06:28:18.4163972Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.4171333Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:18.4175538Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.4176393Z #22 240.9 ptxas info : Used 55 registers, used 1 barriers 2025-09-07T06:28:18.4177085Z #22 240.9 ptxas info : Compile time = 15.890 ms 2025-09-07T06:28:18.4181189Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.4188601Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:18.4193299Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.4194140Z #22 240.9 ptxas info : Used 55 registers, used 1 barriers 2025-09-07T06:28:18.4194873Z #22 240.9 ptxas info : Compile time = 15.917 ms 2025-09-07T06:28:18.4199718Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.4207843Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:18.4212235Z #22 240.9 16 bytes stack frame, 20 bytes spill stores, 24 bytes spill loads 2025-09-07T06:28:18.4213418Z #22 240.9 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:28:18.4214350Z #22 240.9 ptxas info : Compile time = 460.083 ms 2025-09-07T06:28:18.4216483Z #22 240.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.4219968Z #22 240.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:18.4222153Z #22 240.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.4223001Z #22 240.9 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:28:18.4223688Z #22 240.9 ptxas info : Compile time = 24.474 ms 2025-09-07T06:28:18.5727038Z #22 241.2 [4/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/flash_fwd_combine.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/flash_fwd_combine.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/flash_fwd_combine.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:28:18.7297565Z #22 241.2 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:28:18.7324950Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7329792Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7332932Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7334216Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7335249Z #22 241.2 ptxas info : Compile time = 49.068 ms 2025-09-07T06:28:18.7338782Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7343767Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7346690Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7347859Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7349026Z #22 241.2 ptxas info : Compile time = 40.071 ms 2025-09-07T06:28:18.7351831Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7356826Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7359636Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7360783Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7361796Z #22 241.2 ptxas info : Compile time = 37.380 ms 2025-09-07T06:28:18.7364354Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7368438Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7370804Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7371788Z #22 241.2 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7372711Z #22 241.2 ptxas info : Compile time = 35.598 ms 2025-09-07T06:28:18.7375295Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7379169Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7381815Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7382794Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7383618Z #22 241.2 ptxas info : Compile time = 59.175 ms 2025-09-07T06:28:18.7385890Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7390128Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7396516Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7397497Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7398305Z #22 241.2 ptxas info : Compile time = 53.169 ms 2025-09-07T06:28:18.7400804Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7404596Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7406883Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7407850Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7408657Z #22 241.2 ptxas info : Compile time = 49.334 ms 2025-09-07T06:28:18.7410947Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7415323Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7417663Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7418717Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7419634Z #22 241.2 ptxas info : Compile time = 47.935 ms 2025-09-07T06:28:18.7422076Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7426104Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7428561Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7429617Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7430559Z #22 241.2 ptxas info : Compile time = 47.218 ms 2025-09-07T06:28:18.7432973Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7436989Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7439804Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7440880Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7441814Z #22 241.2 ptxas info : Compile time = 41.848 ms 2025-09-07T06:28:18.7444235Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7448252Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7450703Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7451767Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7452805Z #22 241.2 ptxas info : Compile time = 37.342 ms 2025-09-07T06:28:18.7455301Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7459493Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7461951Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7463054Z #22 241.2 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7463956Z #22 241.2 ptxas info : Compile time = 35.642 ms 2025-09-07T06:28:18.7466417Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7470434Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7472896Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7473941Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7474849Z #22 241.2 ptxas info : Compile time = 58.810 ms 2025-09-07T06:28:18.7477285Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7481292Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7483723Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7484808Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7485729Z #22 241.2 ptxas info : Compile time = 54.912 ms 2025-09-07T06:28:18.7488167Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7492521Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7495160Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7496255Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7497172Z #22 241.2 ptxas info : Compile time = 51.160 ms 2025-09-07T06:28:18.7499639Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7503657Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7506116Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7507198Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7508136Z #22 241.2 ptxas info : Compile time = 49.375 ms 2025-09-07T06:28:18.7510578Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7514768Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7517270Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7518318Z #22 241.2 ptxas info : Used 55 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7519246Z #22 241.2 ptxas info : Compile time = 53.699 ms 2025-09-07T06:28:18.7521687Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7525710Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7528212Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7529246Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7530145Z #22 241.2 ptxas info : Compile time = 42.637 ms 2025-09-07T06:28:18.7532722Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7536176Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7538086Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7538875Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7539584Z #22 241.2 ptxas info : Compile time = 39.132 ms 2025-09-07T06:28:18.7541493Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7544584Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7546690Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7547484Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7548187Z #22 241.2 ptxas info : Compile time = 35.536 ms 2025-09-07T06:28:18.7550079Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7553152Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7555054Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7555852Z #22 241.2 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7556530Z #22 241.2 ptxas info : Compile time = 34.638 ms 2025-09-07T06:28:18.7558419Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7561730Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7563636Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7564445Z #22 241.2 ptxas info : Used 52 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7565121Z #22 241.2 ptxas info : Compile time = 67.257 ms 2025-09-07T06:28:18.7566991Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7570113Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7572007Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7573023Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7573704Z #22 241.2 ptxas info : Compile time = 56.930 ms 2025-09-07T06:28:18.7575598Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7578721Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7580621Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7581426Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7582111Z #22 241.2 ptxas info : Compile time = 51.963 ms 2025-09-07T06:28:18.7584003Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7587127Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7589007Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7590003Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7590681Z #22 241.2 ptxas info : Compile time = 49.350 ms 2025-09-07T06:28:18.7592841Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7596043Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7597930Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7598734Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7599435Z #22 241.2 ptxas info : Compile time = 48.226 ms 2025-09-07T06:28:18.7601330Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7604705Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7606638Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7607422Z #22 241.2 ptxas info : Used 55 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7608112Z #22 241.2 ptxas info : Compile time = 52.120 ms 2025-09-07T06:28:18.7609978Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7613263Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7615184Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7615985Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7616695Z #22 241.2 ptxas info : Compile time = 44.095 ms 2025-09-07T06:28:18.7618587Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7621757Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7623691Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7624486Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7625191Z #22 241.2 ptxas info : Compile time = 39.212 ms 2025-09-07T06:28:18.7627107Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7630299Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7632255Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7633065Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7634073Z #22 241.2 ptxas info : Compile time = 35.570 ms 2025-09-07T06:28:18.7635998Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7639168Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7641096Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7641887Z #22 241.2 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7642589Z #22 241.2 ptxas info : Compile time = 34.705 ms 2025-09-07T06:28:18.7644480Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7647624Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7649801Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7650622Z #22 241.2 ptxas info : Used 52 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7651312Z #22 241.2 ptxas info : Compile time = 67.314 ms 2025-09-07T06:28:18.7653390Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7656536Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7658474Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7659278Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7659989Z #22 241.2 ptxas info : Compile time = 57.162 ms 2025-09-07T06:28:18.7661894Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7665031Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7666962Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7667783Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7668461Z #22 241.2 ptxas info : Compile time = 52.212 ms 2025-09-07T06:28:18.7670400Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7673541Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7675425Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7676231Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7677123Z #22 241.2 ptxas info : Compile time = 49.499 ms 2025-09-07T06:28:18.7679015Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7682166Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7684075Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7684888Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7685602Z #22 241.2 ptxas info : Compile time = 46.752 ms 2025-09-07T06:28:18.7687463Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7690565Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7692880Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7693930Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7694633Z #22 241.2 ptxas info : Compile time = 47.486 ms 2025-09-07T06:28:18.7696491Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7699599Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7701505Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7702361Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7703060Z #22 241.2 ptxas info : Compile time = 42.236 ms 2025-09-07T06:28:18.7704928Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7708060Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7709955Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7710770Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7711493Z #22 241.2 ptxas info : Compile time = 37.412 ms 2025-09-07T06:28:18.7713360Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7716498Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7718415Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7719227Z #22 241.2 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7719938Z #22 241.2 ptxas info : Compile time = 35.816 ms 2025-09-07T06:28:18.7721820Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7725176Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7727053Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7727847Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7728541Z #22 241.2 ptxas info : Compile time = 59.249 ms 2025-09-07T06:28:18.7730379Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7733636Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7735522Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7736304Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7737182Z #22 241.2 ptxas info : Compile time = 55.435 ms 2025-09-07T06:28:18.7739046Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7742065Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7743949Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7744756Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7745440Z #22 241.2 ptxas info : Compile time = 51.596 ms 2025-09-07T06:28:18.7747320Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7750388Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7752267Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7753074Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7753780Z #22 241.2 ptxas info : Compile time = 49.816 ms 2025-09-07T06:28:18.7755676Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7759092Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7761334Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7762390Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7763298Z #22 241.2 ptxas info : Compile time = 47.509 ms 2025-09-07T06:28:18.7765437Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7769260Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7771484Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7772595Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7773402Z #22 241.2 ptxas info : Compile time = 42.222 ms 2025-09-07T06:28:18.7775667Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7779397Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7781719Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7782694Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7783519Z #22 241.2 ptxas info : Compile time = 38.497 ms 2025-09-07T06:28:18.7785981Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7789720Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7792236Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7793250Z #22 241.2 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7794089Z #22 241.2 ptxas info : Compile time = 35.748 ms 2025-09-07T06:28:18.7796339Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7800126Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7802413Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7803395Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7804242Z #22 241.2 ptxas info : Compile time = 59.221 ms 2025-09-07T06:28:18.7806470Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7810254Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7812697Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7813637Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7814485Z #22 241.2 ptxas info : Compile time = 55.248 ms 2025-09-07T06:28:18.7816679Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7820768Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7823094Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7824015Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7824876Z #22 241.2 ptxas info : Compile time = 51.556 ms 2025-09-07T06:28:18.7827107Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7830854Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7833171Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7834118Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7834958Z #22 241.2 ptxas info : Compile time = 49.374 ms 2025-09-07T06:28:18.7837509Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7841331Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7843593Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7844574Z #22 241.2 ptxas info : Used 55 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7845429Z #22 241.2 ptxas info : Compile time = 53.862 ms 2025-09-07T06:28:18.7847700Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7851475Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7853954Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7854921Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7855733Z #22 241.2 ptxas info : Compile time = 44.041 ms 2025-09-07T06:28:18.7857954Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7861698Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7863981Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7864951Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7865774Z #22 241.2 ptxas info : Compile time = 36.152 ms 2025-09-07T06:28:18.7868046Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7871766Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7874300Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7875287Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7876118Z #22 241.2 ptxas info : Compile time = 34.226 ms 2025-09-07T06:28:18.7878403Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7882155Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7884382Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7885360Z #22 241.2 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7886195Z #22 241.2 ptxas info : Compile time = 33.014 ms 2025-09-07T06:28:18.7888451Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7892861Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7895133Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7896082Z #22 241.2 ptxas info : Used 52 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7896915Z #22 241.2 ptxas info : Compile time = 65.296 ms 2025-09-07T06:28:18.7899191Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7902972Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7905304Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7906258Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7907100Z #22 241.2 ptxas info : Compile time = 53.540 ms 2025-09-07T06:28:18.7909411Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7913188Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7915474Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7916451Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7917291Z #22 241.2 ptxas info : Compile time = 48.626 ms 2025-09-07T06:28:18.7919559Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7923316Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7925973Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7926933Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7927769Z #22 241.2 ptxas info : Compile time = 45.881 ms 2025-09-07T06:28:18.7929980Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7933866Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7936160Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7937117Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7937963Z #22 241.2 ptxas info : Compile time = 44.643 ms 2025-09-07T06:28:18.7940240Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7944230Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7946534Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7947501Z #22 241.2 ptxas info : Used 55 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7948347Z #22 241.2 ptxas info : Compile time = 78.004 ms 2025-09-07T06:28:18.7950582Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7954367Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7956654Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7957642Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7958487Z #22 241.2 ptxas info : Compile time = 67.454 ms 2025-09-07T06:28:18.7960784Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7964497Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7966854Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7967799Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7968649Z #22 241.2 ptxas info : Compile time = 57.595 ms 2025-09-07T06:28:18.7970960Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7974877Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7977169Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7978390Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7979206Z #22 241.2 ptxas info : Compile time = 55.360 ms 2025-09-07T06:28:18.7981465Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7985189Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7987496Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7988473Z #22 241.2 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.7989309Z #22 241.2 ptxas info : Compile time = 51.297 ms 2025-09-07T06:28:18.7991621Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.7995634Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.7998164Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.7999169Z #22 241.2 ptxas info : Used 52 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8000032Z #22 241.2 ptxas info : Compile time = 106.961 ms 2025-09-07T06:28:18.8002348Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8006141Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8008404Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8009414Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8010244Z #22 241.2 ptxas info : Compile time = 91.301 ms 2025-09-07T06:28:18.8012662Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8016486Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8018801Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8019786Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8020617Z #22 241.2 ptxas info : Compile time = 82.179 ms 2025-09-07T06:28:18.8022914Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8026728Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8029036Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8030026Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8031221Z #22 241.2 ptxas info : Compile time = 71.731 ms 2025-09-07T06:28:18.8033524Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8037340Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8039634Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8040662Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8041515Z #22 241.2 ptxas info : Compile time = 73.511 ms 2025-09-07T06:28:18.8043752Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8047350Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8049569Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8050754Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8051629Z #22 241.2 ptxas info : Compile time = 72.032 ms 2025-09-07T06:28:18.8053998Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8057627Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8059860Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8060838Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8061690Z #22 241.2 ptxas info : Compile time = 59.065 ms 2025-09-07T06:28:18.8063889Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8067458Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8069658Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8070634Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8071472Z #22 241.2 ptxas info : Compile time = 54.464 ms 2025-09-07T06:28:18.8073606Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8077214Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8079471Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8080442Z #22 241.2 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8081296Z #22 241.2 ptxas info : Compile time = 54.735 ms 2025-09-07T06:28:18.8083511Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8087277Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8089490Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8090446Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8091283Z #22 241.2 ptxas info : Compile time = 91.731 ms 2025-09-07T06:28:18.8095560Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8098977Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8101197Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8102147Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8102965Z #22 241.2 ptxas info : Compile time = 86.533 ms 2025-09-07T06:28:18.8105560Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8109282Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8111563Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8112607Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8113484Z #22 241.2 ptxas info : Compile time = 79.010 ms 2025-09-07T06:28:18.8115738Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8119463Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8121726Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8122715Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8123571Z #22 241.2 ptxas info : Compile time = 74.019 ms 2025-09-07T06:28:18.8125767Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8129410Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8131450Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8132335Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8133247Z #22 241.2 ptxas info : Compile time = 71.689 ms 2025-09-07T06:28:18.8135374Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8139079Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8141665Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8142683Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8143522Z #22 241.2 ptxas info : Compile time = 62.946 ms 2025-09-07T06:28:18.8145743Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8149350Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8151559Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8152568Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8153401Z #22 241.2 ptxas info : Compile time = 57.779 ms 2025-09-07T06:28:18.8155591Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8159404Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8161626Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8162580Z #22 241.2 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8163407Z #22 241.2 ptxas info : Compile time = 55.786 ms 2025-09-07T06:28:18.8165601Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8169160Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8171408Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8172596Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8173446Z #22 241.2 ptxas info : Compile time = 93.876 ms 2025-09-07T06:28:18.8175654Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8179209Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8181418Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8182406Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8183257Z #22 241.2 ptxas info : Compile time = 84.875 ms 2025-09-07T06:28:18.8185445Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8189054Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8191285Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8192795Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8193652Z #22 241.2 ptxas info : Compile time = 77.830 ms 2025-09-07T06:28:18.8195802Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8199394Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8201622Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8202596Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8203446Z #22 241.2 ptxas info : Compile time = 74.399 ms 2025-09-07T06:28:18.8205617Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8209217Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8211652Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8212798Z #22 241.2 ptxas info : Used 55 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8213608Z #22 241.2 ptxas info : Compile time = 83.339 ms 2025-09-07T06:28:18.8215771Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8219337Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8221551Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8222522Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8223364Z #22 241.2 ptxas info : Compile time = 66.928 ms 2025-09-07T06:28:18.8225540Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8229144Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8231355Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8232338Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8233186Z #22 241.2 ptxas info : Compile time = 57.491 ms 2025-09-07T06:28:18.8235382Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8238942Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8241131Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8242107Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8242941Z #22 241.2 ptxas info : Compile time = 55.345 ms 2025-09-07T06:28:18.8245454Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8249037Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8251321Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8252317Z #22 241.2 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8253301Z #22 241.2 ptxas info : Compile time = 53.713 ms 2025-09-07T06:28:18.8255471Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8259044Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8261237Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8262206Z #22 241.2 ptxas info : Used 52 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8263242Z #22 241.2 ptxas info : Compile time = 107.457 ms 2025-09-07T06:28:18.8265439Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8269035Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8271270Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8272272Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8273147Z #22 241.2 ptxas info : Compile time = 90.173 ms 2025-09-07T06:28:18.8275390Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8279001Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8281256Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8282236Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8283080Z #22 241.2 ptxas info : Compile time = 79.899 ms 2025-09-07T06:28:18.8285335Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8288924Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8291160Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8292554Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8293421Z #22 241.2 ptxas info : Compile time = 75.033 ms 2025-09-07T06:28:18.8295666Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8299555Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8301853Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8302854Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8303717Z #22 241.2 ptxas info : Compile time = 72.135 ms 2025-09-07T06:28:18.8305966Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8309570Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8311875Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8312890Z #22 241.2 ptxas info : Used 55 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8313786Z #22 241.2 ptxas info : Compile time = 83.821 ms 2025-09-07T06:28:18.8316501Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8320418Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8322862Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8323879Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8324871Z #22 241.2 ptxas info : Compile time = 62.654 ms 2025-09-07T06:28:18.8327300Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8331227Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8333785Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8334840Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8335747Z #22 241.2 ptxas info : Compile time = 55.143 ms 2025-09-07T06:28:18.8337998Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8341965Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8344031Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8344974Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8345783Z #22 241.2 ptxas info : Compile time = 48.620 ms 2025-09-07T06:28:18.8347876Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8351343Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8353728Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8354648Z #22 241.2 ptxas info : Used 40 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8355467Z #22 241.2 ptxas info : Compile time = 50.332 ms 2025-09-07T06:28:18.8357609Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8361177Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8363357Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8364290Z #22 241.2 ptxas info : Used 52 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8365134Z #22 241.2 ptxas info : Compile time = 97.831 ms 2025-09-07T06:28:18.8367253Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8370925Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8373249Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8374173Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8374983Z #22 241.2 ptxas info : Compile time = 82.829 ms 2025-09-07T06:28:18.8377144Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8380669Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8382857Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8383783Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8384599Z #22 241.2 ptxas info : Compile time = 81.078 ms 2025-09-07T06:28:18.8386706Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8390222Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8415231Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8416342Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8417150Z #22 241.2 ptxas info : Compile time = 77.624 ms 2025-09-07T06:28:18.8419300Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:18.8422755Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8424859Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8425825Z #22 241.2 ptxas info : Used 41 registers, used 1 barriers, 576 bytes cmem[0] 2025-09-07T06:28:18.8426963Z #22 241.2 ptxas info : Compile time = 76.004 ms 2025-09-07T06:28:18.8427551Z #22 241.2 ptxas info : 10 bytes gmem 2025-09-07T06:28:18.8429704Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8433464Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8435709Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8436547Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8437244Z #22 241.2 ptxas info : Compile time = 118.053 ms 2025-09-07T06:28:18.8439517Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8443536Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8445790Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8446638Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8447330Z #22 241.2 ptxas info : Compile time = 74.676 ms 2025-09-07T06:28:18.8449553Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8453462Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8455686Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8456533Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8457226Z #22 241.2 ptxas info : Compile time = 68.451 ms 2025-09-07T06:28:18.8459500Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8463239Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8465508Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8466366Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:28:18.8467052Z #22 241.2 ptxas info : Compile time = 66.288 ms 2025-09-07T06:28:18.8469331Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8473126Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8475402Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8476246Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8476945Z #22 241.2 ptxas info : Compile time = 109.657 ms 2025-09-07T06:28:18.8479445Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8483187Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8485485Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8486370Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8487097Z #22 241.2 ptxas info : Compile time = 100.045 ms 2025-09-07T06:28:18.8489408Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8493644Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8495946Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8497053Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8497760Z #22 241.2 ptxas info : Compile time = 94.148 ms 2025-09-07T06:28:18.8499948Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8503718Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8506053Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8506908Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8507625Z #22 241.2 ptxas info : Compile time = 87.357 ms 2025-09-07T06:28:18.8509893Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8513799Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8516436Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8517428Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8518298Z #22 241.2 ptxas info : Compile time = 83.927 ms 2025-09-07T06:28:18.8520907Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8525263Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8527896Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8528891Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8529737Z #22 241.2 ptxas info : Compile time = 75.351 ms 2025-09-07T06:28:18.8532115Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8536245Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8538481Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8539299Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8539999Z #22 241.2 ptxas info : Compile time = 70.683 ms 2025-09-07T06:28:18.8542202Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8545818Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8548040Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8548867Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:28:18.8549544Z #22 241.2 ptxas info : Compile time = 68.536 ms 2025-09-07T06:28:18.8551891Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8555525Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8557718Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8558476Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8559142Z #22 241.2 ptxas info : Compile time = 112.890 ms 2025-09-07T06:28:18.8561159Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8564549Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8566660Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8567398Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8568006Z #22 241.2 ptxas info : Compile time = 102.713 ms 2025-09-07T06:28:18.8569995Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8573573Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8575591Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8576337Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8576943Z #22 241.2 ptxas info : Compile time = 94.218 ms 2025-09-07T06:28:18.8578945Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8582674Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8584651Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8585465Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8586157Z #22 241.2 ptxas info : Compile time = 90.336 ms 2025-09-07T06:28:18.8588412Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8592572Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8594941Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8595848Z #22 241.2 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:28:18.8596587Z #22 241.2 ptxas info : Compile time = 103.856 ms 2025-09-07T06:28:18.8600776Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8604723Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8607086Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8607987Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8608728Z #22 241.2 ptxas info : Compile time = 77.779 ms 2025-09-07T06:28:18.8611095Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8615241Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8617568Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8618490Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8619257Z #22 241.2 ptxas info : Compile time = 68.097 ms 2025-09-07T06:28:18.8621648Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8625662Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8628099Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8629001Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8629748Z #22 241.2 ptxas info : Compile time = 63.727 ms 2025-09-07T06:28:18.8632139Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8636081Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8638824Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8639732Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:28:18.8640484Z #22 241.2 ptxas info : Compile time = 63.616 ms 2025-09-07T06:28:18.8642873Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8646906Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8649317Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8650204Z #22 241.2 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:28:18.8650977Z #22 241.2 ptxas info : Compile time = 126.859 ms 2025-09-07T06:28:18.8653491Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8657714Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8660110Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8661001Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8661747Z #22 241.2 ptxas info : Compile time = 102.705 ms 2025-09-07T06:28:18.8664096Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8668035Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8670502Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8671418Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8672165Z #22 241.2 ptxas info : Compile time = 93.081 ms 2025-09-07T06:28:18.8674567Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8678554Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8680968Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8681873Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8682611Z #22 241.2 ptxas info : Compile time = 85.925 ms 2025-09-07T06:28:18.8685004Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8688925Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8691291Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8692747Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8693477Z #22 241.2 ptxas info : Compile time = 82.527 ms 2025-09-07T06:28:18.8695862Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8699795Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8702154Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8703041Z #22 241.2 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:28:18.8703766Z #22 241.2 ptxas info : Compile time = 97.503 ms 2025-09-07T06:28:18.8706193Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8710360Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8712704Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8713598Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8714321Z #22 241.2 ptxas info : Compile time = 77.778 ms 2025-09-07T06:28:18.8716715Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8720680Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8723042Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8723944Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8724679Z #22 241.2 ptxas info : Compile time = 68.022 ms 2025-09-07T06:28:18.8727095Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8731083Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8733604Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8734516Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8735259Z #22 241.2 ptxas info : Compile time = 63.609 ms 2025-09-07T06:28:18.8737713Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8741669Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8744018Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8744918Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:28:18.8745959Z #22 241.2 ptxas info : Compile time = 63.351 ms 2025-09-07T06:28:18.8748347Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8752337Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8754774Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8755675Z #22 241.2 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:28:18.8756444Z #22 241.2 ptxas info : Compile time = 127.424 ms 2025-09-07T06:28:18.8758847Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8762858Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8765293Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8766351Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8767120Z #22 241.2 ptxas info : Compile time = 102.670 ms 2025-09-07T06:28:18.8769578Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8774386Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8776855Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8777732Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8778498Z #22 241.2 ptxas info : Compile time = 92.096 ms 2025-09-07T06:28:18.8780884Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8784895Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8787335Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8788243Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8789008Z #22 241.2 ptxas info : Compile time = 87.004 ms 2025-09-07T06:28:18.8791418Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8796404Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_10bfloat16_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8798805Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8799688Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8800419Z #22 241.2 ptxas info : Compile time = 84.648 ms 2025-09-07T06:28:18.8802762Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8806902Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8809216Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8810117Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8810835Z #22 241.2 ptxas info : Compile time = 81.393 ms 2025-09-07T06:28:18.8813282Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8817149Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8819458Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8820358Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8821118Z #22 241.2 ptxas info : Compile time = 71.447 ms 2025-09-07T06:28:18.8823668Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8827988Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8830389Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8831317Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8832069Z #22 241.2 ptxas info : Compile time = 64.197 ms 2025-09-07T06:28:18.8834405Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8838377Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8840710Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8841612Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:28:18.8842364Z #22 241.2 ptxas info : Compile time = 61.108 ms 2025-09-07T06:28:18.8844704Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8848595Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8850984Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8851854Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8852731Z #22 241.2 ptxas info : Compile time = 100.721 ms 2025-09-07T06:28:18.8855053Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8859026Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8861675Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8862566Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8863333Z #22 241.2 ptxas info : Compile time = 80.090 ms 2025-09-07T06:28:18.8865736Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8869629Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8872003Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8872900Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8873683Z #22 241.2 ptxas info : Compile time = 76.075 ms 2025-09-07T06:28:18.8876186Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8880087Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8882463Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8883358Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8884113Z #22 241.2 ptxas info : Compile time = 74.324 ms 2025-09-07T06:28:18.8886524Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8890384Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8893198Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8894087Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8894862Z #22 241.2 ptxas info : Compile time = 78.360 ms 2025-09-07T06:28:18.8897229Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8901154Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8903576Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8904509Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8905286Z #22 241.2 ptxas info : Compile time = 70.601 ms 2025-09-07T06:28:18.8907638Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8911518Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8914164Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8915066Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8915816Z #22 241.2 ptxas info : Compile time = 64.462 ms 2025-09-07T06:28:18.8918192Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8922052Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8924393Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8925282Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:28:18.8926033Z #22 241.2 ptxas info : Compile time = 60.778 ms 2025-09-07T06:28:18.8928888Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8933253Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8935647Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8936568Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8937308Z #22 241.2 ptxas info : Compile time = 101.496 ms 2025-09-07T06:28:18.8939642Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8943510Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8945917Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8946830Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8947560Z #22 241.2 ptxas info : Compile time = 91.388 ms 2025-09-07T06:28:18.8949947Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8953851Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8956242Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8957145Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8957884Z #22 241.2 ptxas info : Compile time = 81.641 ms 2025-09-07T06:28:18.8960232Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8964069Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8966400Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8967299Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8968235Z #22 241.2 ptxas info : Compile time = 77.517 ms 2025-09-07T06:28:18.8970565Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8974840Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8977157Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8978061Z #22 241.2 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:28:18.8978792Z #22 241.2 ptxas info : Compile time = 87.078 ms 2025-09-07T06:28:18.8981083Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8984919Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8987276Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8988359Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8989101Z #22 241.2 ptxas info : Compile time = 67.375 ms 2025-09-07T06:28:18.8991427Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.8995596Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.8997972Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.8998840Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.8999590Z #22 241.2 ptxas info : Compile time = 58.736 ms 2025-09-07T06:28:18.9001953Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9005834Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9008185Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9009050Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9009780Z #22 241.2 ptxas info : Compile time = 53.217 ms 2025-09-07T06:28:18.9012113Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9016108Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9018490Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9019369Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:28:18.9020109Z #22 241.2 ptxas info : Compile time = 52.055 ms 2025-09-07T06:28:18.9022455Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9026600Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9028958Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9029840Z #22 241.2 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:28:18.9030597Z #22 241.2 ptxas info : Compile time = 105.313 ms 2025-09-07T06:28:18.9032932Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9036752Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9039088Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9039957Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9040700Z #22 241.2 ptxas info : Compile time = 106.122 ms 2025-09-07T06:28:18.9043250Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9047030Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9049344Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9050233Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9050954Z #22 241.2 ptxas info : Compile time = 87.611 ms 2025-09-07T06:28:18.9053367Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9057162Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9059524Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9060398Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9061144Z #22 241.2 ptxas info : Compile time = 76.148 ms 2025-09-07T06:28:18.9063518Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9067353Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9069731Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9070621Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9071367Z #22 241.2 ptxas info : Compile time = 74.330 ms 2025-09-07T06:28:18.9073730Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9077631Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9080188Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9081094Z #22 241.2 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:28:18.9081842Z #22 241.2 ptxas info : Compile time = 87.814 ms 2025-09-07T06:28:18.9084280Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9088167Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9090505Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9091383Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9092334Z #22 241.2 ptxas info : Compile time = 65.313 ms 2025-09-07T06:28:18.9094729Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9098841Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9101160Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9102029Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9102754Z #22 241.2 ptxas info : Compile time = 61.470 ms 2025-09-07T06:28:18.9105096Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9109170Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9111255Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9112087Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9112773Z #22 241.2 ptxas info : Compile time = 57.462 ms 2025-09-07T06:28:18.9114965Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9118587Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9120761Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9121601Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:28:18.9122304Z #22 241.2 ptxas info : Compile time = 55.353 ms 2025-09-07T06:28:18.9124499Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9128192Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9130436Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9131571Z #22 241.2 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:28:18.9132258Z #22 241.2 ptxas info : Compile time = 107.784 ms 2025-09-07T06:28:18.9134678Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9138370Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9140878Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9141849Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9142638Z #22 241.2 ptxas info : Compile time = 95.969 ms 2025-09-07T06:28:18.9145155Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9149589Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9152159Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9153121Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9153902Z #22 241.2 ptxas info : Compile time = 83.198 ms 2025-09-07T06:28:18.9156412Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9160647Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9163182Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9164151Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9164953Z #22 241.2 ptxas info : Compile time = 76.537 ms 2025-09-07T06:28:18.9167503Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9171664Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1ENS_6half_tEfNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9173927Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9174731Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9175382Z #22 241.2 ptxas info : Compile time = 79.304 ms 2025-09-07T06:28:18.9177365Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9180594Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9182551Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9183355Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9184003Z #22 241.2 ptxas info : Compile time = 73.340 ms 2025-09-07T06:28:18.9186252Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9189491Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9191496Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9192540Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9193183Z #22 241.2 ptxas info : Compile time = 63.665 ms 2025-09-07T06:28:18.9195150Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9199017Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9200988Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9201803Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9202721Z #22 241.2 ptxas info : Compile time = 60.373 ms 2025-09-07T06:28:18.9204717Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9208001Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9209947Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9210747Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:28:18.9211411Z #22 241.2 ptxas info : Compile time = 56.971 ms 2025-09-07T06:28:18.9213532Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9216731Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9218672Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9219450Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9220126Z #22 241.2 ptxas info : Compile time = 92.895 ms 2025-09-07T06:28:18.9222105Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9225377Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9227399Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9228193Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9228864Z #22 241.2 ptxas info : Compile time = 82.929 ms 2025-09-07T06:28:18.9230888Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9234165Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9236505Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9237281Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9237945Z #22 241.2 ptxas info : Compile time = 77.347 ms 2025-09-07T06:28:18.9239938Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9243129Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9245128Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9245929Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9246604Z #22 241.2 ptxas info : Compile time = 74.727 ms 2025-09-07T06:28:18.9248614Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9252131Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9254294Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9255064Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9255737Z #22 241.2 ptxas info : Compile time = 70.304 ms 2025-09-07T06:28:18.9257659Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9260888Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9262866Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9263622Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9264280Z #22 241.2 ptxas info : Compile time = 63.618 ms 2025-09-07T06:28:18.9266251Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9269542Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9271593Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9272401Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9273081Z #22 241.2 ptxas info : Compile time = 58.284 ms 2025-09-07T06:28:18.9275120Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9278403Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9280418Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9281439Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:28:18.9282112Z #22 241.2 ptxas info : Compile time = 56.954 ms 2025-09-07T06:28:18.9284121Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9287392Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9289411Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9290240Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9290909Z #22 241.2 ptxas info : Compile time = 90.877 ms 2025-09-07T06:28:18.9293351Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9296669Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9298693Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9299795Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9300479Z #22 241.2 ptxas info : Compile time = 81.393 ms 2025-09-07T06:28:18.9302498Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9305716Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9307737Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9308549Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9309217Z #22 241.2 ptxas info : Compile time = 74.910 ms 2025-09-07T06:28:18.9311236Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9314542Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi8EEENS5_ILi128EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9316561Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9317370Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9318057Z #22 241.2 ptxas info : Compile time = 77.876 ms 2025-09-07T06:28:18.9320062Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9323374Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9325398Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9326200Z #22 241.2 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:28:18.9326871Z #22 241.2 ptxas info : Compile time = 85.834 ms 2025-09-07T06:28:18.9328888Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9332609Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9334645Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9335450Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9336110Z #22 241.2 ptxas info : Compile time = 69.920 ms 2025-09-07T06:28:18.9338065Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9341259Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9343234Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9344059Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9344723Z #22 241.2 ptxas info : Compile time = 60.522 ms 2025-09-07T06:28:18.9346971Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9350240Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9352294Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9353116Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9353817Z #22 241.2 ptxas info : Compile time = 64.453 ms 2025-09-07T06:28:18.9355818Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9359127Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9361165Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9361995Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:28:18.9362674Z #22 241.2 ptxas info : Compile time = 60.788 ms 2025-09-07T06:28:18.9364720Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9368082Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9370162Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9370997Z #22 241.2 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:28:18.9371697Z #22 241.2 ptxas info : Compile time = 122.883 ms 2025-09-07T06:28:18.9373891Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9377194Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9379502Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9380301Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9380975Z #22 241.2 ptxas info : Compile time = 100.140 ms 2025-09-07T06:28:18.9383023Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9386285Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9388296Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9389079Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9389745Z #22 241.2 ptxas info : Compile time = 88.894 ms 2025-09-07T06:28:18.9391775Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9395631Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9397712Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9398517Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9399209Z #22 241.2 ptxas info : Compile time = 83.172 ms 2025-09-07T06:28:18.9401240Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9404563Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1EffNS_4arch4Sm90EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9406624Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9407412Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9408089Z #22 241.2 ptxas info : Compile time = 81.154 ms 2025-09-07T06:28:18.9410102Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9413493Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9415479Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9416263Z #22 241.2 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:28:18.9416946Z #22 241.2 ptxas info : Compile time = 94.948 ms 2025-09-07T06:28:18.9419064Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9422349Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9424409Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9425223Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9425909Z #22 241.2 ptxas info : Compile time = 75.023 ms 2025-09-07T06:28:18.9428318Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9431821Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9434103Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9434878Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9435594Z #22 241.2 ptxas info : Compile time = 65.494 ms 2025-09-07T06:28:18.9437671Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9441108Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9443042Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9443804Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9444605Z #22 241.2 ptxas info : Compile time = 61.381 ms 2025-09-07T06:28:18.9446893Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9451432Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb0EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9454266Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9455337Z #22 241.2 ptxas info : Used 42 registers, used 1 barriers 2025-09-07T06:28:18.9456184Z #22 241.2 ptxas info : Compile time = 60.910 ms 2025-09-07T06:28:18.9458859Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9463316Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi8ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9466047Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9467113Z #22 241.2 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:28:18.9467999Z #22 241.2 ptxas info : Compile time = 122.153 ms 2025-09-07T06:28:18.9470706Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9475144Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi7ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9477913Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9478870Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9479554Z #22 241.2 ptxas info : Compile time = 98.678 ms 2025-09-07T06:28:18.9481624Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9485106Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi6ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9487425Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9488269Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9488967Z #22 241.2 ptxas info : Compile time = 89.562 ms 2025-09-07T06:28:18.9491046Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9494826Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi5ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9496811Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9497624Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9498314Z #22 241.2 ptxas info : Compile time = 84.849 ms 2025-09-07T06:28:18.9500090Z #22 241.2 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:18.9503449Z #22 241.2 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19FlashAttnFwdCombineIN4cute5tupleIJNS3_1CILi16EEENS5_ILi64EEEEEELi4ELi256ELi1ELb0ELb1EffNS_4arch4Sm80EEEEEvNT_6ParamsE 2025-09-07T06:28:18.9505348Z #22 241.2 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:18.9506402Z #22 241.2 ptxas info : Used 48 registers, used 1 barriers 2025-09-07T06:28:18.9507270Z #22 241.2 ptxas info : Compile time = 82.195 ms 2025-09-07T06:28:29.8684827Z #22 252.5 [5/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_bf16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_bf16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_bf16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:28:29.8702493Z #22 252.5 ptxas info : 130 bytes gmem, 104 bytes cmem[4] 2025-09-07T06:28:29.8707168Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:29.8716061Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:29.8720806Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:29.8721765Z #22 252.5 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:29.8722525Z #22 252.5 ptxas info : Compile time = 1.889 ms 2025-09-07T06:28:29.8726974Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:29.8734743Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:29.8738924Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:29.8739809Z #22 252.5 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:29.8740566Z #22 252.5 ptxas info : Compile time = 0.890 ms 2025-09-07T06:28:29.8744915Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:29.8753722Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:29.8758311Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:29.8759301Z #22 252.5 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:29.8760262Z #22 252.5 ptxas info : Compile time = 0.629 ms 2025-09-07T06:28:29.8765033Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:30.0183906Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0189691Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0190762Z #22 252.5 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:30.0191661Z #22 252.5 ptxas info : Compile time = 0.580 ms 2025-09-07T06:28:30.0197311Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:30.0207285Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0212355Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0213638Z #22 252.5 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:30.0214595Z #22 252.5 ptxas info : Compile time = 0.586 ms 2025-09-07T06:28:30.0219975Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:30.0229130Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0234671Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0235793Z #22 252.5 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:30.0236696Z #22 252.5 ptxas info : Compile time = 0.554 ms 2025-09-07T06:28:30.0241739Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:30.0251465Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0257089Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0258193Z #22 252.5 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:30.0259127Z #22 252.5 ptxas info : Compile time = 0.524 ms 2025-09-07T06:28:30.0264127Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:30.0277137Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0282167Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0283193Z #22 252.5 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:30.0284083Z #22 252.5 ptxas info : Compile time = 0.540 ms 2025-09-07T06:28:30.0288617Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:30.0298664Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0303537Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0304591Z #22 252.5 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:30.0305532Z #22 252.5 ptxas info : Compile time = 0.614 ms 2025-09-07T06:28:30.0310249Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:30.0318962Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0324784Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0325823Z #22 252.5 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:30.0326751Z #22 252.5 ptxas info : Compile time = 0.551 ms 2025-09-07T06:28:30.0329112Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:30.0333222Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:30.0335701Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0336711Z #22 252.5 ptxas info : Used 62 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:28:30.0337642Z #22 252.5 ptxas info : Compile time = 73.647 ms 2025-09-07T06:28:30.0342886Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:30.0352743Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0357747Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0358830Z #22 252.5 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:30.0359725Z #22 252.5 ptxas info : Compile time = 0.904 ms 2025-09-07T06:28:30.0364409Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:30.0373026Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:30.0377646Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0378732Z #22 252.5 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:28:30.0379639Z #22 252.5 ptxas info : Compile time = 33.974 ms 2025-09-07T06:28:30.0384961Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:30.0393996Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:30.0398687Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0399755Z #22 252.5 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:28:30.0400650Z #22 252.5 ptxas info : Compile time = 13.754 ms 2025-09-07T06:28:30.0405529Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:30.0418287Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0423244Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0424326Z #22 252.5 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:30.0425266Z #22 252.5 ptxas info : Compile time = 0.830 ms 2025-09-07T06:28:30.0427601Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:30.0431577Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:30.0434028Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0435096Z #22 252.5 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:28:30.0436012Z #22 252.5 ptxas info : Compile time = 32.953 ms 2025-09-07T06:28:30.0436683Z #22 252.5 ptxas info : 10 bytes gmem 2025-09-07T06:28:30.0442067Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:30.0451060Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0456224Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0457183Z #22 252.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:30.0458333Z #22 252.5 ptxas info : Compile time = 376.506 ms 2025-09-07T06:28:30.0463248Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:30.0472896Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0477918Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0478902Z #22 252.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:30.0479718Z #22 252.5 ptxas info : Compile time = 389.717 ms 2025-09-07T06:28:30.0484805Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:30.0609984Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0616031Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0617072Z #22 252.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:30.0617863Z #22 252.5 ptxas info : Compile time = 476.014 ms 2025-09-07T06:28:30.0622819Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:30.0635087Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0640499Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0641487Z #22 252.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:30.0642273Z #22 252.5 ptxas info : Compile time = 422.802 ms 2025-09-07T06:28:30.0647137Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:30.0657286Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0662298Z #22 252.5 48 bytes stack frame, 64 bytes spill stores, 116 bytes spill loads 2025-09-07T06:28:30.0663510Z #22 252.5 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:28:30.0664537Z #22 252.5 ptxas info : Compile time = 1099.459 ms 2025-09-07T06:28:30.0670298Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:30.0679265Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0684294Z #22 252.5 48 bytes stack frame, 64 bytes spill stores, 116 bytes spill loads 2025-09-07T06:28:30.0685495Z #22 252.5 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:28:30.0686549Z #22 252.5 ptxas info : Compile time = 1104.346 ms 2025-09-07T06:28:30.0691450Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:30.0700947Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0706338Z #22 252.5 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:28:30.0707500Z #22 252.5 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:28:30.0708489Z #22 252.5 ptxas info : Compile time = 1193.434 ms 2025-09-07T06:28:30.0713381Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:30.0722775Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0727784Z #22 252.5 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:28:30.0728993Z #22 252.5 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:28:30.0732953Z #22 252.5 ptxas info : Compile time = 1136.454 ms 2025-09-07T06:28:30.0737707Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:30.0747346Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0752173Z #22 252.5 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:28:30.0753384Z #22 252.5 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:28:30.0754439Z #22 252.5 ptxas info : Compile time = 725.653 ms 2025-09-07T06:28:30.0759215Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:30.0768348Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0773749Z #22 252.5 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:28:30.0774951Z #22 252.5 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:28:30.0775990Z #22 252.5 ptxas info : Compile time = 682.861 ms 2025-09-07T06:28:30.0778424Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:30.0782408Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:30.0784872Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0785826Z #22 252.5 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:28:30.0786872Z #22 252.5 ptxas info : Compile time = 30.941 ms 2025-09-07T06:28:30.0792498Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:30.0801855Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0806893Z #22 252.5 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:28:30.0808095Z #22 252.5 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:28:30.0809143Z #22 252.5 ptxas info : Compile time = 783.888 ms 2025-09-07T06:28:30.0832354Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:30.0841008Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:30.0845751Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0846745Z #22 252.5 ptxas info : Used 55 registers, used 1 barriers 2025-09-07T06:28:30.0847528Z #22 252.5 ptxas info : Compile time = 26.290 ms 2025-09-07T06:28:30.0852196Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:30.0860951Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:30.0865655Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0867174Z #22 252.5 ptxas info : Used 55 registers, used 1 barriers 2025-09-07T06:28:30.0867985Z #22 252.5 ptxas info : Compile time = 26.089 ms 2025-09-07T06:28:30.0872777Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:30.0882183Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:30.0887157Z #22 252.5 16 bytes stack frame, 20 bytes spill stores, 24 bytes spill loads 2025-09-07T06:28:30.0888364Z #22 252.5 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:28:30.0889397Z #22 252.5 ptxas info : Compile time = 734.614 ms 2025-09-07T06:28:30.0892689Z #22 252.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:30.0897183Z #22 252.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:30.0899614Z #22 252.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:30.0900580Z #22 252.5 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:28:30.0901340Z #22 252.5 ptxas info : Compile time = 37.651 ms 2025-09-07T06:28:34.9222647Z #22 257.6 [6/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_bf16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_bf16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_bf16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:28:34.9244375Z #22 257.6 ptxas info : 130 bytes gmem, 104 bytes cmem[4] 2025-09-07T06:28:34.9249903Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:34.9266876Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9269581Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9270178Z #22 257.6 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:34.9270712Z #22 257.6 ptxas info : Compile time = 1.908 ms 2025-09-07T06:28:34.9273632Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:34.9278451Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9281682Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9282703Z #22 257.6 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:34.9283617Z #22 257.6 ptxas info : Compile time = 0.897 ms 2025-09-07T06:28:34.9289143Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:34.9299300Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9304375Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9305472Z #22 257.6 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:34.9306354Z #22 257.6 ptxas info : Compile time = 0.666 ms 2025-09-07T06:28:34.9311283Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:34.9320782Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9325668Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9326710Z #22 257.6 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:34.9327530Z #22 257.6 ptxas info : Compile time = 0.582 ms 2025-09-07T06:28:34.9332431Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:34.9342466Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9347700Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9348822Z #22 257.6 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:34.9349798Z #22 257.6 ptxas info : Compile time = 0.579 ms 2025-09-07T06:28:34.9354836Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:34.9364264Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9369450Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9370566Z #22 257.6 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:34.9371527Z #22 257.6 ptxas info : Compile time = 0.603 ms 2025-09-07T06:28:34.9376890Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:34.9385954Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9391304Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9395113Z #22 257.6 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:34.9396009Z #22 257.6 ptxas info : Compile time = 0.553 ms 2025-09-07T06:28:34.9401028Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:34.9409630Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9412501Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9413262Z #22 257.6 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:34.9414078Z #22 257.6 ptxas info : Compile time = 0.553 ms 2025-09-07T06:28:34.9418831Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:34.9427095Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9432222Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9433353Z #22 257.6 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:34.9434341Z #22 257.6 ptxas info : Compile time = 0.643 ms 2025-09-07T06:28:34.9439657Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:34.9448960Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9454669Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9455779Z #22 257.6 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:34.9456804Z #22 257.6 ptxas info : Compile time = 0.565 ms 2025-09-07T06:28:34.9459455Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:34.9463725Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:34.9466371Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9467517Z #22 257.6 ptxas info : Used 119 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:28:34.9468386Z #22 257.6 ptxas info : Compile time = 51.866 ms 2025-09-07T06:28:34.9472399Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:34.9479775Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9482484Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9483091Z #22 257.6 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:34.9483604Z #22 257.6 ptxas info : Compile time = 0.991 ms 2025-09-07T06:28:34.9486238Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x80x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:34.9490831Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x80x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:34.9493714Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9494322Z #22 257.6 ptxas info : Used 30 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:28:34.9494844Z #22 257.6 ptxas info : Compile time = 21.967 ms 2025-09-07T06:28:34.9498753Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:34.9508474Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:34.9513471Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9514594Z #22 257.6 ptxas info : Used 29 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:28:34.9515553Z #22 257.6 ptxas info : Compile time = 19.142 ms 2025-09-07T06:28:34.9520532Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:34.9530123Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9535266Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9536383Z #22 257.6 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:34.9537376Z #22 257.6 ptxas info : Compile time = 0.840 ms 2025-09-07T06:28:34.9539824Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:34.9543642Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:34.9545987Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9546907Z #22 257.6 ptxas info : Used 121 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:28:34.9547699Z #22 257.6 ptxas info : Compile time = 103.554 ms 2025-09-07T06:28:34.9548313Z #22 257.6 ptxas info : 10 bytes gmem 2025-09-07T06:28:34.9553178Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:34.9562807Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9567911Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9569192Z #22 257.6 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:28:34.9570025Z #22 257.6 ptxas info : Compile time = 340.724 ms 2025-09-07T06:28:34.9575366Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:34.9584899Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9590019Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9591014Z #22 257.6 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:28:34.9591830Z #22 257.6 ptxas info : Compile time = 347.468 ms 2025-09-07T06:28:34.9597582Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:34.9606957Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9612244Z #22 257.6 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:28:34.9613688Z #22 257.6 ptxas info : Used 168 registers, used 10 barriers, 8 bytes cumulative stack size 2025-09-07T06:28:34.9614776Z #22 257.6 ptxas info : Compile time = 486.392 ms 2025-09-07T06:28:34.9635050Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:34.9644788Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9650182Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9651197Z #22 257.6 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:28:34.9652015Z #22 257.6 ptxas info : Compile time = 389.786 ms 2025-09-07T06:28:34.9657490Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:34.9667638Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9673122Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9674126Z #22 257.6 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:28:34.9674972Z #22 257.6 ptxas info : Compile time = 424.263 ms 2025-09-07T06:28:34.9680556Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:34.9690285Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9696029Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9697086Z #22 257.6 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:28:34.9697943Z #22 257.6 ptxas info : Compile time = 378.647 ms 2025-09-07T06:28:34.9702927Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:34.9712947Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9718429Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9719462Z #22 257.6 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:28:34.9720299Z #22 257.6 ptxas info : Compile time = 519.070 ms 2025-09-07T06:28:34.9725313Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:34.9734693Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9739974Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9741020Z #22 257.6 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:28:34.9741870Z #22 257.6 ptxas info : Compile time = 420.960 ms 2025-09-07T06:28:34.9747154Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:34.9757316Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9762725Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9763757Z #22 257.6 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:28:34.9764608Z #22 257.6 ptxas info : Compile time = 411.763 ms 2025-09-07T06:28:34.9769733Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:34.9779357Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9784499Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9785541Z #22 257.6 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:28:34.9786396Z #22 257.6 ptxas info : Compile time = 290.131 ms 2025-09-07T06:28:34.9788967Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:34.9793955Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:34.9796573Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9797582Z #22 257.6 ptxas info : Used 124 registers, used 0 barriers 2025-09-07T06:28:34.9798777Z #22 257.6 ptxas info : Compile time = 36.387 ms 2025-09-07T06:28:34.9803966Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:34.9812867Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:34.9817912Z #22 257.6 16 bytes stack frame, 20 bytes spill stores, 20 bytes spill loads 2025-09-07T06:28:34.9819125Z #22 257.6 ptxas info : Used 168 registers, used 10 barriers, 16 bytes cumulative stack size 2025-09-07T06:28:34.9820192Z #22 257.6 ptxas info : Compile time = 367.230 ms 2025-09-07T06:28:34.9825200Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x80x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:34.9834090Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x80x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:34.9839074Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:34.9840071Z #22 257.6 ptxas info : Used 109 registers, used 1 barriers 2025-09-07T06:28:34.9840851Z #22 257.6 ptxas info : Compile time = 28.167 ms 2025-09-07T06:28:35.0710257Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:35.0719759Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:35.0724993Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:35.0726024Z #22 257.6 ptxas info : Used 89 registers, used 1 barriers 2025-09-07T06:28:35.0726903Z #22 257.6 ptxas info : Compile time = 24.777 ms 2025-09-07T06:28:35.0732585Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:35.0742914Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:35.0748435Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:35.0749480Z #22 257.6 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:28:35.0750349Z #22 257.6 ptxas info : Compile time = 258.766 ms 2025-09-07T06:28:35.0753000Z #22 257.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:35.0757466Z #22 257.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:35.0760018Z #22 257.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:35.0761053Z #22 257.6 ptxas info : Used 122 registers, used 0 barriers 2025-09-07T06:28:35.0761910Z #22 257.6 ptxas info : Compile time = 42.262 ms 2025-09-07T06:28:42.0109026Z #22 264.7 [7/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_bf16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_bf16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_bf16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:28:42.1677066Z #22 264.7 ptxas info : 130 bytes gmem, 104 bytes cmem[4] 2025-09-07T06:28:42.1681902Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:42.1691237Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.1696506Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.1697593Z #22 264.7 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:42.1698399Z #22 264.7 ptxas info : Compile time = 1.949 ms 2025-09-07T06:28:42.1703284Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:42.1711366Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.1715979Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.1716919Z #22 264.7 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:42.1717764Z #22 264.7 ptxas info : Compile time = 0.941 ms 2025-09-07T06:28:42.1722318Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:42.1730597Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.1735130Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.1736098Z #22 264.7 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:42.1736912Z #22 264.7 ptxas info : Compile time = 0.643 ms 2025-09-07T06:28:42.1741364Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:42.1750242Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.1754560Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.1755400Z #22 264.7 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:42.1756145Z #22 264.7 ptxas info : Compile time = 0.574 ms 2025-09-07T06:28:42.1760165Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:42.1768559Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.1773301Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.1774303Z #22 264.7 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:42.1775165Z #22 264.7 ptxas info : Compile time = 20.982 ms 2025-09-07T06:28:42.1779919Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:42.1788669Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.1793853Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.1794839Z #22 264.7 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:42.1795680Z #22 264.7 ptxas info : Compile time = 0.721 ms 2025-09-07T06:28:42.1799728Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:42.1807711Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.1812702Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.1813668Z #22 264.7 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:42.1814459Z #22 264.7 ptxas info : Compile time = 0.642 ms 2025-09-07T06:28:42.1819013Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:42.1827726Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.1832190Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.1833073Z #22 264.7 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:42.1833923Z #22 264.7 ptxas info : Compile time = 0.591 ms 2025-09-07T06:28:42.1838306Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:42.1846621Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.1851100Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.1851953Z #22 264.7 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:42.1852853Z #22 264.7 ptxas info : Compile time = 0.666 ms 2025-09-07T06:28:42.1856676Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:42.1864614Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.1869801Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.1870843Z #22 264.7 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:42.1871704Z #22 264.7 ptxas info : Compile time = 0.576 ms 2025-09-07T06:28:42.1874028Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:42.1877859Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:42.1880043Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.1881028Z #22 264.7 ptxas info : Used 47 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:28:42.1881885Z #22 264.7 ptxas info : Compile time = 19.618 ms 2025-09-07T06:28:42.1886871Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:42.1896021Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.1900305Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.1901200Z #22 264.7 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:42.1901984Z #22 264.7 ptxas info : Compile time = 0.927 ms 2025-09-07T06:28:42.1906170Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x96x16_F32BF16BF16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:42.1913776Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x96x16_F32BF16BF16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:42.1918209Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.1919173Z #22 264.7 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:28:42.1919973Z #22 264.7 ptxas info : Compile time = 15.591 ms 2025-09-07T06:28:42.1924461Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x48x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:42.1932874Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x48x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:42.1937163Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.1938037Z #22 264.7 ptxas info : Used 29 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:28:42.1938797Z #22 264.7 ptxas info : Compile time = 11.463 ms 2025-09-07T06:28:42.1943371Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:42.1952339Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.1957018Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.1957934Z #22 264.7 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:42.1958654Z #22 264.7 ptxas info : Compile time = 0.852 ms 2025-09-07T06:28:42.1960580Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:42.1963886Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:42.1966153Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.1967140Z #22 264.7 ptxas info : Used 52 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:28:42.1967992Z #22 264.7 ptxas info : Compile time = 21.758 ms 2025-09-07T06:28:42.1968627Z #22 264.7 ptxas info : 10 bytes gmem 2025-09-07T06:28:42.1973444Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:42.1981859Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.1986649Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.1987536Z #22 264.7 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:42.1988255Z #22 264.7 ptxas info : Compile time = 189.506 ms 2025-09-07T06:28:42.1993183Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:42.2001501Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.2005453Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.2006193Z #22 264.7 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:42.2007102Z #22 264.7 ptxas info : Compile time = 188.210 ms 2025-09-07T06:28:42.2010857Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:42.2018514Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.2023084Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.2023961Z #22 264.7 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:42.2024699Z #22 264.7 ptxas info : Compile time = 240.601 ms 2025-09-07T06:28:42.2029201Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:42.2037566Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.2042204Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.2043102Z #22 264.7 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:42.2043864Z #22 264.7 ptxas info : Compile time = 211.976 ms 2025-09-07T06:28:42.2048799Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:42.2057607Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.2061975Z #22 264.7 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:28:42.2063076Z #22 264.7 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:28:42.2064012Z #22 264.7 ptxas info : Compile time = 217.758 ms 2025-09-07T06:28:42.2068972Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:42.2077517Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.2082171Z #22 264.7 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:28:42.2083291Z #22 264.7 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:28:42.2084248Z #22 264.7 ptxas info : Compile time = 208.028 ms 2025-09-07T06:28:42.2088916Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:42.2097956Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.2102628Z #22 264.7 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:28:42.2103631Z #22 264.7 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:28:42.2104488Z #22 264.7 ptxas info : Compile time = 256.486 ms 2025-09-07T06:28:42.2108679Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:42.2117093Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.2122329Z #22 264.7 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:28:42.2123482Z #22 264.7 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:28:42.2124545Z #22 264.7 ptxas info : Compile time = 233.219 ms 2025-09-07T06:28:42.2129437Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:42.2138141Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.2142610Z #22 264.7 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:28:42.2143728Z #22 264.7 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:28:42.2144669Z #22 264.7 ptxas info : Compile time = 214.566 ms 2025-09-07T06:28:42.2149291Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:42.2157568Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.2161984Z #22 264.7 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:28:42.2162979Z #22 264.7 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:28:42.2163879Z #22 264.7 ptxas info : Compile time = 204.233 ms 2025-09-07T06:28:42.2166049Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:42.2169593Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:42.2172055Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.2173124Z #22 264.7 ptxas info : Used 44 registers, used 0 barriers 2025-09-07T06:28:42.2173867Z #22 264.7 ptxas info : Compile time = 14.205 ms 2025-09-07T06:28:42.2178666Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:42.2187375Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.2192975Z #22 264.7 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:28:42.2194120Z #22 264.7 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:28:42.2195086Z #22 264.7 ptxas info : Compile time = 251.518 ms 2025-09-07T06:28:42.2199406Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x96x16_F32BF16BF16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:42.2207205Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x96x16_F32BF16BF16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:42.2211365Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.2212245Z #22 264.7 ptxas info : Used 72 registers, used 1 barriers 2025-09-07T06:28:42.2213128Z #22 264.7 ptxas info : Compile time = 18.145 ms 2025-09-07T06:28:42.2217388Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x48x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:42.2225218Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x48x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:42.2229186Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.2229950Z #22 264.7 ptxas info : Used 47 registers, used 1 barriers 2025-09-07T06:28:42.2230923Z #22 264.7 ptxas info : Compile time = 13.271 ms 2025-09-07T06:28:42.2235072Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:42.2243372Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:42.2248265Z #22 264.7 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:28:42.2249419Z #22 264.7 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:28:42.2250421Z #22 264.7 ptxas info : Compile time = 216.533 ms 2025-09-07T06:28:42.2253214Z #22 264.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:42.2256865Z #22 264.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:42.2259046Z #22 264.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:42.2259988Z #22 264.7 ptxas info : Used 50 registers, used 0 barriers 2025-09-07T06:28:42.2260758Z #22 264.7 ptxas info : Compile time = 16.479 ms 2025-09-07T06:28:50.4053638Z #22 273.1 [8/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:28:50.5624402Z #22 273.1 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:28:50.5629555Z #22 273.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:50.5638882Z #22 273.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:50.5644072Z #22 273.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:50.5645258Z #22 273.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:28:50.5646255Z #22 273.1 ptxas info : Compile time = 1.946 ms 2025-09-07T06:28:50.5652260Z #22 273.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:50.5662672Z #22 273.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:50.5668297Z #22 273.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:50.5669418Z #22 273.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:28:50.5670422Z #22 273.1 ptxas info : Compile time = 0.903 ms 2025-09-07T06:28:50.5676063Z #22 273.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:50.5686382Z #22 273.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:50.5692685Z #22 273.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:50.5693875Z #22 273.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:28:50.5694856Z #22 273.1 ptxas info : Compile time = 21.098 ms 2025-09-07T06:28:50.5700263Z #22 273.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:50.5710570Z #22 273.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:50.5716039Z #22 273.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:50.5717167Z #22 273.1 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:28:50.5718138Z #22 273.1 ptxas info : Compile time = 0.712 ms 2025-09-07T06:28:50.5724034Z #22 273.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:50.5734542Z #22 273.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:50.5740206Z #22 273.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:50.5741362Z #22 273.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:28:50.5742315Z #22 273.1 ptxas info : Compile time = 0.589 ms 2025-09-07T06:28:50.5747861Z #22 273.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:50.5758064Z #22 273.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:50.5763797Z #22 273.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:50.5764870Z #22 273.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:28:50.5765838Z #22 273.1 ptxas info : Compile time = 0.547 ms 2025-09-07T06:28:50.5771329Z #22 273.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:50.5781607Z #22 273.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:50.5787019Z #22 273.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:50.5788127Z #22 273.1 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:28:50.5789096Z #22 273.1 ptxas info : Compile time = 0.547 ms 2025-09-07T06:28:50.5795174Z #22 273.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:50.5805515Z #22 273.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:50.5811264Z #22 273.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:50.5812516Z #22 273.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:28:50.5813449Z #22 273.1 ptxas info : Compile time = 0.558 ms 2025-09-07T06:28:50.5819129Z #22 273.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:50.5829404Z #22 273.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:50.5835019Z #22 273.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:50.5836183Z #22 273.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:28:50.5837146Z #22 273.1 ptxas info : Compile time = 0.587 ms 2025-09-07T06:28:50.5837900Z #22 273.1 ptxas info : 10 bytes gmem 2025-09-07T06:28:50.5842999Z #22 273.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:50.5852980Z #22 273.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:50.5858178Z #22 273.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:50.5859205Z #22 273.1 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:50.5860074Z #22 273.1 ptxas info : Compile time = 608.099 ms 2025-09-07T06:28:50.5865731Z #22 273.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:50.5876247Z #22 273.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:50.5881847Z #22 273.1 8 bytes stack frame, 4 bytes spill stores, 8 bytes spill loads 2025-09-07T06:28:50.5883150Z #22 273.1 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:28:50.5884292Z #22 273.1 ptxas info : Compile time = 742.453 ms 2025-09-07T06:28:50.5890040Z #22 273.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:50.5901012Z #22 273.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:50.5906593Z #22 273.1 32 bytes stack frame, 68 bytes spill stores, 84 bytes spill loads 2025-09-07T06:28:50.5907900Z #22 273.1 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:28:50.5909045Z #22 273.1 ptxas info : Compile time = 930.337 ms 2025-09-07T06:28:50.5914617Z #22 273.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:50.5924699Z #22 273.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:50.5930562Z #22 273.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:50.5931619Z #22 273.1 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:50.5932664Z #22 273.1 ptxas info : Compile time = 844.646 ms 2025-09-07T06:28:50.5938337Z #22 273.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:50.5948840Z #22 273.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:50.5954471Z #22 273.1 16 bytes stack frame, 12 bytes spill stores, 16 bytes spill loads 2025-09-07T06:28:50.5955751Z #22 273.1 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:28:50.5956834Z #22 273.1 ptxas info : Compile time = 915.275 ms 2025-09-07T06:28:50.5962440Z #22 273.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:50.5972894Z #22 273.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:50.5978586Z #22 273.1 40 bytes stack frame, 92 bytes spill stores, 128 bytes spill loads 2025-09-07T06:28:50.5979900Z #22 273.1 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:28:50.5981034Z #22 273.1 ptxas info : Compile time = 1737.563 ms 2025-09-07T06:28:50.5986571Z #22 273.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:50.5996850Z #22 273.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:50.6002592Z #22 273.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:50.6003642Z #22 273.1 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:50.6004557Z #22 273.1 ptxas info : Compile time = 1184.636 ms 2025-09-07T06:28:50.6010106Z #22 273.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:50.6020729Z #22 273.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:50.6026367Z #22 273.1 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:28:50.6027672Z #22 273.1 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:28:50.6028791Z #22 273.1 ptxas info : Compile time = 1050.168 ms 2025-09-07T06:28:50.6034446Z #22 273.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:50.6044768Z #22 273.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:50.6050403Z #22 273.1 32 bytes stack frame, 64 bytes spill stores, 76 bytes spill loads 2025-09-07T06:28:50.6051590Z #22 273.1 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:28:50.6052940Z #22 273.1 ptxas info : Compile time = 1931.699 ms 2025-09-07T06:28:54.4082012Z #22 277.1 [9/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_fp16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_fp16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_fp16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:28:54.5626784Z #22 277.1 ptxas info : 130 bytes gmem, 104 bytes cmem[4] 2025-09-07T06:28:54.5636826Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:54.5646775Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.5652299Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5653618Z #22 277.1 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:54.5654628Z #22 277.1 ptxas info : Compile time = 22.037 ms 2025-09-07T06:28:54.5660104Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:54.5670043Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.5675541Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5676718Z #22 277.1 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:54.5677735Z #22 277.1 ptxas info : Compile time = 1.045 ms 2025-09-07T06:28:54.5683170Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:54.5693327Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.5698227Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5699243Z #22 277.1 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:54.5700086Z #22 277.1 ptxas info : Compile time = 0.704 ms 2025-09-07T06:28:54.5704649Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:54.5713476Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.5718242Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5719230Z #22 277.1 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:54.5720123Z #22 277.1 ptxas info : Compile time = 0.619 ms 2025-09-07T06:28:54.5724686Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:54.5733241Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.5737936Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5738943Z #22 277.1 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:54.5739770Z #22 277.1 ptxas info : Compile time = 0.593 ms 2025-09-07T06:28:54.5744237Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:54.5752777Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.5757842Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5758850Z #22 277.1 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:54.5759698Z #22 277.1 ptxas info : Compile time = 0.580 ms 2025-09-07T06:28:54.5764408Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:54.5773001Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.5777333Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5778281Z #22 277.1 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:54.5779074Z #22 277.1 ptxas info : Compile time = 0.566 ms 2025-09-07T06:28:54.5783399Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:54.5791272Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.5795841Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5796727Z #22 277.1 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:54.5797517Z #22 277.1 ptxas info : Compile time = 0.549 ms 2025-09-07T06:28:54.5801683Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:54.5809481Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.5814475Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5815530Z #22 277.1 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:54.5816437Z #22 277.1 ptxas info : Compile time = 0.612 ms 2025-09-07T06:28:54.5820885Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:54.5828912Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.5833273Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5834585Z #22 277.1 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:54.5835427Z #22 277.1 ptxas info : Compile time = 0.568 ms 2025-09-07T06:28:54.5837674Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:54.5841256Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:54.5843500Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5844517Z #22 277.1 ptxas info : Used 48 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:28:54.5845397Z #22 277.1 ptxas info : Compile time = 40.763 ms 2025-09-07T06:28:54.5850020Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:54.5860765Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.5866448Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5867445Z #22 277.1 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:54.5868292Z #22 277.1 ptxas info : Compile time = 0.901 ms 2025-09-07T06:28:54.5872769Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x96x16_F32F16F16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:54.5880166Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x96x16_F32F16F16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:54.5884191Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5885164Z #22 277.1 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:28:54.5885969Z #22 277.1 ptxas info : Compile time = 35.073 ms 2025-09-07T06:28:54.5889906Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x48x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:54.5898726Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x48x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:54.5903037Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5903977Z #22 277.1 ptxas info : Used 29 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:28:54.5904792Z #22 277.1 ptxas info : Compile time = 31.499 ms 2025-09-07T06:28:54.5909507Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:54.5918084Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.5923239Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5924272Z #22 277.1 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:54.5925176Z #22 277.1 ptxas info : Compile time = 0.812 ms 2025-09-07T06:28:54.5927607Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:54.5931526Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:54.5934006Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5935072Z #22 277.1 ptxas info : Used 51 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:28:54.5936014Z #22 277.1 ptxas info : Compile time = 42.453 ms 2025-09-07T06:28:54.5937052Z #22 277.1 ptxas info : 10 bytes gmem 2025-09-07T06:28:54.5941828Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:54.5950985Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.5956139Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5957100Z #22 277.1 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:54.5957875Z #22 277.1 ptxas info : Compile time = 289.887 ms 2025-09-07T06:28:54.5963099Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:54.5971918Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.5977088Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5978071Z #22 277.1 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:54.5978842Z #22 277.1 ptxas info : Compile time = 291.522 ms 2025-09-07T06:28:54.5983828Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:54.5993236Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.5998074Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.5998997Z #22 277.1 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:54.5999771Z #22 277.1 ptxas info : Compile time = 364.275 ms 2025-09-07T06:28:54.6004646Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:54.6014175Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.6019067Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.6020051Z #22 277.1 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:28:54.6020835Z #22 277.1 ptxas info : Compile time = 332.610 ms 2025-09-07T06:28:54.6026339Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:54.6035672Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.6040554Z #22 277.1 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:28:54.6041696Z #22 277.1 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:28:54.6042748Z #22 277.1 ptxas info : Compile time = 351.925 ms 2025-09-07T06:28:54.6047656Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:54.6057034Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.6062226Z #22 277.1 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:28:54.6063476Z #22 277.1 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:28:54.6064508Z #22 277.1 ptxas info : Compile time = 329.898 ms 2025-09-07T06:28:54.6069705Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:54.6079257Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.6084284Z #22 277.1 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:28:54.6085412Z #22 277.1 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:28:54.6086389Z #22 277.1 ptxas info : Compile time = 408.023 ms 2025-09-07T06:28:54.6091200Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:54.6100883Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.6105883Z #22 277.1 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:28:54.6107096Z #22 277.1 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:28:54.6108160Z #22 277.1 ptxas info : Compile time = 369.051 ms 2025-09-07T06:28:54.6113268Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:54.6121880Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.6126724Z #22 277.1 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:28:54.6127901Z #22 277.1 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:28:54.6128945Z #22 277.1 ptxas info : Compile time = 342.964 ms 2025-09-07T06:28:54.6133944Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:54.6142974Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.6147475Z #22 277.1 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:28:54.6148483Z #22 277.1 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:28:54.6149349Z #22 277.1 ptxas info : Compile time = 331.988 ms 2025-09-07T06:28:54.6151356Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:54.6154628Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:54.6157040Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.6157916Z #22 277.1 ptxas info : Used 48 registers, used 0 barriers 2025-09-07T06:28:54.6158619Z #22 277.1 ptxas info : Compile time = 24.093 ms 2025-09-07T06:28:54.6163278Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:54.6171474Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.6176034Z #22 277.1 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:28:54.6177240Z #22 277.1 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:28:54.6178280Z #22 277.1 ptxas info : Compile time = 393.184 ms 2025-09-07T06:28:54.6182753Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x96x16_F32F16F16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:54.6190971Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x96x16_F32F16F16_RSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:54.6195832Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.6196771Z #22 277.1 ptxas info : Used 72 registers, used 1 barriers 2025-09-07T06:28:54.6197546Z #22 277.1 ptxas info : Compile time = 28.263 ms 2025-09-07T06:28:54.6201917Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x48x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:54.6210464Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x48x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:54.6215176Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.6216122Z #22 277.1 ptxas info : Used 47 registers, used 1 barriers 2025-09-07T06:28:54.6216907Z #22 277.1 ptxas info : Compile time = 20.063 ms 2025-09-07T06:28:54.6222220Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:54.6231081Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:54.6235945Z #22 277.1 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:28:54.6237134Z #22 277.1 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:28:54.6238166Z #22 277.1 ptxas info : Compile time = 349.655 ms 2025-09-07T06:28:54.6240499Z #22 277.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:54.6244253Z #22 277.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:54.6246599Z #22 277.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:54.6247558Z #22 277.1 ptxas info : Used 48 registers, used 0 barriers 2025-09-07T06:28:54.6248346Z #22 277.1 ptxas info : Compile time = 27.318 ms 2025-09-07T06:28:56.1566083Z #22 278.8 [10/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_bf16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_bf16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_bf16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:28:56.1585925Z #22 278.8 ptxas info : 130 bytes gmem, 104 bytes cmem[4] 2025-09-07T06:28:56.1591497Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:56.1601196Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.1606431Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.1607551Z #22 278.8 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:56.1608480Z #22 278.8 ptxas info : Compile time = 1.921 ms 2025-09-07T06:28:56.1613678Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:56.1622979Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.1628229Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.1629366Z #22 278.8 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:56.1630282Z #22 278.8 ptxas info : Compile time = 0.877 ms 2025-09-07T06:28:56.1635374Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:56.1645213Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.1650424Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.1651520Z #22 278.8 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:56.1652618Z #22 278.8 ptxas info : Compile time = 0.652 ms 2025-09-07T06:28:56.1657729Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:56.1667388Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.1672652Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.1673774Z #22 278.8 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:56.1674743Z #22 278.8 ptxas info : Compile time = 0.573 ms 2025-09-07T06:28:56.1679922Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:56.1688991Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.1696793Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.1697877Z #22 278.8 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:56.1698822Z #22 278.8 ptxas info : Compile time = 0.576 ms 2025-09-07T06:28:56.1703890Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:56.1713380Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.1718964Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.1720105Z #22 278.8 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:56.1721049Z #22 278.8 ptxas info : Compile time = 0.587 ms 2025-09-07T06:28:56.1741567Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:56.1751533Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.1756683Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.1757787Z #22 278.8 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:56.1758686Z #22 278.8 ptxas info : Compile time = 0.558 ms 2025-09-07T06:28:56.1763601Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:56.1773037Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.1778219Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.1779279Z #22 278.8 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:56.1780217Z #22 278.8 ptxas info : Compile time = 0.553 ms 2025-09-07T06:28:56.1785017Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:56.1794392Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.1799773Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.1800856Z #22 278.8 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:56.1801751Z #22 278.8 ptxas info : Compile time = 0.645 ms 2025-09-07T06:28:56.1806632Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:56.1815589Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.1820471Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.1821952Z #22 278.8 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:56.1822907Z #22 278.8 ptxas info : Compile time = 0.571 ms 2025-09-07T06:28:56.1825337Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:56.1829299Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:56.1831832Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.1832850Z #22 278.8 ptxas info : Used 84 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:28:56.1833671Z #22 278.8 ptxas info : Compile time = 34.371 ms 2025-09-07T06:28:56.1838604Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:56.1847927Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.1853246Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.1854294Z #22 278.8 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:56.1855183Z #22 278.8 ptxas info : Compile time = 0.898 ms 2025-09-07T06:28:56.1860010Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x96x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi3EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:56.1868838Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x96x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi3EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:56.1873675Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.1874737Z #22 278.8 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:28:56.1875687Z #22 278.8 ptxas info : Compile time = 17.154 ms 2025-09-07T06:28:56.1880374Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi3EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:56.1889323Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi3EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:56.1894503Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.1895557Z #22 278.8 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:28:56.1896483Z #22 278.8 ptxas info : Compile time = 13.987 ms 2025-09-07T06:28:56.1901566Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:56.1910857Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.1915735Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.1916786Z #22 278.8 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:56.1917635Z #22 278.8 ptxas info : Compile time = 0.808 ms 2025-09-07T06:28:56.1919754Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:56.1923494Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:56.1925841Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.1926849Z #22 278.8 ptxas info : Used 87 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:28:56.1928284Z #22 278.8 ptxas info : Compile time = 39.230 ms 2025-09-07T06:28:56.1928872Z #22 278.8 ptxas info : 10 bytes gmem 2025-09-07T06:28:56.1933829Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:56.1942899Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.1948185Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.1949185Z #22 278.8 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:56.1949988Z #22 278.8 ptxas info : Compile time = 250.231 ms 2025-09-07T06:28:56.1955419Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:56.1964635Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.1969769Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.1970729Z #22 278.8 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:56.1971538Z #22 278.8 ptxas info : Compile time = 255.443 ms 2025-09-07T06:28:56.1976697Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:56.2043902Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.2048928Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.2049843Z #22 278.8 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:56.2050615Z #22 278.8 ptxas info : Compile time = 340.507 ms 2025-09-07T06:28:56.2055647Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:56.2064804Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.2069715Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.2070653Z #22 278.8 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:56.2071438Z #22 278.8 ptxas info : Compile time = 287.640 ms 2025-09-07T06:28:56.2076799Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:56.2086054Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.2090769Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.2091704Z #22 278.8 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:56.2092915Z #22 278.8 ptxas info : Compile time = 307.385 ms 2025-09-07T06:28:56.2097748Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:56.2108135Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.2114389Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.2115837Z #22 278.8 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:56.2116633Z #22 278.8 ptxas info : Compile time = 276.908 ms 2025-09-07T06:28:56.2121725Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:56.2131201Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.2136500Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.2137513Z #22 278.8 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:56.2138302Z #22 278.8 ptxas info : Compile time = 375.915 ms 2025-09-07T06:28:56.2143134Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:56.2152626Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.2157798Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.2158794Z #22 278.8 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:56.2159640Z #22 278.8 ptxas info : Compile time = 326.453 ms 2025-09-07T06:28:56.2164649Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:56.2173976Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.2179058Z #22 278.8 8 bytes stack frame, 4 bytes spill stores, 8 bytes spill loads 2025-09-07T06:28:56.2180298Z #22 278.8 ptxas info : Used 128 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:28:56.2181347Z #22 278.8 ptxas info : Compile time = 290.926 ms 2025-09-07T06:28:56.2186254Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:56.2194983Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.2201724Z #22 278.8 8 bytes stack frame, 4 bytes spill stores, 8 bytes spill loads 2025-09-07T06:28:56.2202882Z #22 278.8 ptxas info : Used 128 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:28:56.2203887Z #22 278.8 ptxas info : Compile time = 266.954 ms 2025-09-07T06:28:56.2206174Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:56.2209913Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:56.2212240Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.2213405Z #22 278.8 ptxas info : Used 90 registers, used 0 barriers 2025-09-07T06:28:56.2214145Z #22 278.8 ptxas info : Compile time = 36.381 ms 2025-09-07T06:28:56.2219059Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:56.2228066Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.2232973Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.2233929Z #22 278.8 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:56.2245596Z #22 278.8 ptxas info : Compile time = 338.703 ms 2025-09-07T06:28:56.2250339Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x96x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi3EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:56.3066797Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x96x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi3EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:56.3071840Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.3072799Z #22 278.8 ptxas info : Used 72 registers, used 1 barriers 2025-09-07T06:28:56.3073610Z #22 278.8 ptxas info : Compile time = 32.156 ms 2025-09-07T06:28:56.3078615Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi3EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:56.3087169Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi3EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:56.3091790Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.3093212Z #22 278.8 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:28:56.3094035Z #22 278.8 ptxas info : Compile time = 25.922 ms 2025-09-07T06:28:56.3099193Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:56.3107002Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:56.3111229Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.3112061Z #22 278.8 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:56.3112755Z #22 278.8 ptxas info : Compile time = 297.125 ms 2025-09-07T06:28:56.3114787Z #22 278.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:56.3118129Z #22 278.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:56.3120185Z #22 278.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:56.3121065Z #22 278.8 ptxas info : Used 76 registers, used 0 barriers 2025-09-07T06:28:56.3121835Z #22 278.8 ptxas info : Compile time = 44.124 ms 2025-09-07T06:28:58.3135053Z #22 281.0 [11/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_fp16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_fp16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_fp16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:28:58.4721044Z #22 281.0 ptxas info : 130 bytes gmem, 104 bytes cmem[4] 2025-09-07T06:28:58.4726730Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:58.4736648Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.4741886Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.4743029Z #22 281.0 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:58.4743970Z #22 281.0 ptxas info : Compile time = 1.806 ms 2025-09-07T06:28:58.4749255Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:58.4758849Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.4764199Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.4765304Z #22 281.0 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:58.4766284Z #22 281.0 ptxas info : Compile time = 0.843 ms 2025-09-07T06:28:58.4771580Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:58.4781576Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.4787165Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.4788343Z #22 281.0 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:58.4789368Z #22 281.0 ptxas info : Compile time = 0.598 ms 2025-09-07T06:28:58.4795241Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:58.4805332Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.4810751Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.4811866Z #22 281.0 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:58.4812917Z #22 281.0 ptxas info : Compile time = 0.575 ms 2025-09-07T06:28:58.4818520Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:58.4828468Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.4833893Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.4835086Z #22 281.0 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:58.4836097Z #22 281.0 ptxas info : Compile time = 0.578 ms 2025-09-07T06:28:58.4841663Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:58.4851483Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.4857340Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.4858489Z #22 281.0 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:58.4859474Z #22 281.0 ptxas info : Compile time = 0.576 ms 2025-09-07T06:28:58.4864868Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:58.4875036Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.4880441Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.4881582Z #22 281.0 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:58.4882546Z #22 281.0 ptxas info : Compile time = 0.567 ms 2025-09-07T06:28:58.4887930Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:58.4898018Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.4903421Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.4904569Z #22 281.0 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:58.4905464Z #22 281.0 ptxas info : Compile time = 0.570 ms 2025-09-07T06:28:58.4910783Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:58.4919361Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.4925171Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.4926348Z #22 281.0 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:28:58.4927330Z #22 281.0 ptxas info : Compile time = 0.634 ms 2025-09-07T06:28:58.4932831Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:58.4941913Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.4947249Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.4948655Z #22 281.0 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:58.4949669Z #22 281.0 ptxas info : Compile time = 0.581 ms 2025-09-07T06:28:58.4952257Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:58.4956549Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:58.4958979Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.4960123Z #22 281.0 ptxas info : Used 75 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:28:58.4961131Z #22 281.0 ptxas info : Compile time = 86.700 ms 2025-09-07T06:28:58.4966422Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:58.4976659Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.4981955Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.4983136Z #22 281.0 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:58.4984117Z #22 281.0 ptxas info : Compile time = 0.912 ms 2025-09-07T06:28:58.4988869Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x96x16_F32F16F16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi3EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:58.4998396Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x96x16_F32F16F16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi3EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:58.5003308Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.5004460Z #22 281.0 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:28:58.5005390Z #22 281.0 ptxas info : Compile time = 37.078 ms 2025-09-07T06:28:58.5010300Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi3EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:58.5019079Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi3EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:58.5024190Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.5025349Z #22 281.0 ptxas info : Used 28 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:28:58.5026346Z #22 281.0 ptxas info : Compile time = 34.146 ms 2025-09-07T06:28:58.5031924Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:58.5041607Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.5046795Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.5047986Z #22 281.0 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:28:58.5048974Z #22 281.0 ptxas info : Compile time = 0.824 ms 2025-09-07T06:28:58.5051616Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:28:58.5055915Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:58.5058571Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.5059746Z #22 281.0 ptxas info : Used 78 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:28:58.5060724Z #22 281.0 ptxas info : Compile time = 51.814 ms 2025-09-07T06:28:58.5061746Z #22 281.0 ptxas info : 10 bytes gmem 2025-09-07T06:28:58.5067875Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:58.5076773Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.5081587Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.5082516Z #22 281.0 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:58.5083268Z #22 281.0 ptxas info : Compile time = 242.237 ms 2025-09-07T06:28:58.5088309Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:58.5099400Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.5105175Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.5105919Z #22 281.0 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:58.5106572Z #22 281.0 ptxas info : Compile time = 249.442 ms 2025-09-07T06:28:58.5111469Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:58.5119073Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.5123327Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.5124154Z #22 281.0 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:58.5124810Z #22 281.0 ptxas info : Compile time = 333.114 ms 2025-09-07T06:28:58.5128913Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:58.5137123Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.5141513Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.5142374Z #22 281.0 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:58.5143137Z #22 281.0 ptxas info : Compile time = 277.745 ms 2025-09-07T06:28:58.5148009Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:58.5156422Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.5160930Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.5161820Z #22 281.0 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:58.5162534Z #22 281.0 ptxas info : Compile time = 295.983 ms 2025-09-07T06:28:58.5167117Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:58.5175154Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.5179821Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.5180882Z #22 281.0 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:58.5181780Z #22 281.0 ptxas info : Compile time = 277.113 ms 2025-09-07T06:28:58.5186928Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:58.5195050Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.5199329Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.5200106Z #22 281.0 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:58.5200774Z #22 281.0 ptxas info : Compile time = 374.247 ms 2025-09-07T06:28:58.5205131Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:58.5229032Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.5234350Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.5235367Z #22 281.0 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:58.5236199Z #22 281.0 ptxas info : Compile time = 309.687 ms 2025-09-07T06:28:58.5240558Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:58.5248697Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb0ELb1ELi3EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.5252754Z #22 281.0 8 bytes stack frame, 4 bytes spill stores, 8 bytes spill loads 2025-09-07T06:28:58.5253859Z #22 281.0 ptxas info : Used 128 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:28:58.5254880Z #22 281.0 ptxas info : Compile time = 293.018 ms 2025-09-07T06:28:58.5259566Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:58.5268334Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.5273508Z #22 281.0 8 bytes stack frame, 4 bytes spill stores, 8 bytes spill loads 2025-09-07T06:28:58.5274708Z #22 281.0 ptxas info : Used 128 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:28:58.5275704Z #22 281.0 ptxas info : Compile time = 266.028 ms 2025-09-07T06:28:58.5278116Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:58.5282060Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:58.5284475Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.5285459Z #22 281.0 ptxas info : Used 76 registers, used 0 barriers 2025-09-07T06:28:58.5286271Z #22 281.0 ptxas info : Compile time = 36.731 ms 2025-09-07T06:28:58.5291694Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:58.5301713Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li384ELb1ELb1ELi3EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.5306884Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.5307884Z #22 281.0 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:58.5308730Z #22 281.0 ptxas info : Compile time = 336.614 ms 2025-09-07T06:28:58.5313199Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x96x16_F32F16F16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi3EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:58.5321198Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x96x16_F32F16F16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi3EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:58.5325676Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.5326642Z #22 281.0 ptxas info : Used 72 registers, used 1 barriers 2025-09-07T06:28:58.5327379Z #22 281.0 ptxas info : Compile time = 30.214 ms 2025-09-07T06:28:58.5331902Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi3EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:58.5340121Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELi384ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi3EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:28:58.5343855Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.5344607Z #22 281.0 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:28:58.5345264Z #22 281.0 ptxas info : Compile time = 23.807 ms 2025-09-07T06:28:58.5349421Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:58.5358266Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi96EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi3ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li384ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:28:58.5362226Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.5362965Z #22 281.0 ptxas info : Used 128 registers, used 16 barriers 2025-09-07T06:28:58.5363579Z #22 281.0 ptxas info : Compile time = 285.980 ms 2025-09-07T06:28:58.5365323Z #22 281.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:28:58.5368619Z #22 281.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:28:58.5370776Z #22 281.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:28:58.5371632Z #22 281.0 ptxas info : Used 74 registers, used 0 barriers 2025-09-07T06:28:58.5372551Z #22 281.0 ptxas info : Compile time = 44.757 ms 2025-09-07T06:29:08.0316723Z #22 290.7 [12/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_fp16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_fp16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_fp16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:29:08.1934581Z #22 290.7 ptxas info : 130 bytes gmem, 104 bytes cmem[4] 2025-09-07T06:29:08.1940287Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:08.1950817Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.1956355Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.1957517Z #22 290.7 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:29:08.1958525Z #22 290.7 ptxas info : Compile time = 1.846 ms 2025-09-07T06:29:08.1964098Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:08.1974193Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.1979660Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.1980841Z #22 290.7 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:08.1981845Z #22 290.7 ptxas info : Compile time = 0.852 ms 2025-09-07T06:29:08.1987445Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:08.1997727Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2003595Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2004790Z #22 290.7 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:08.2005791Z #22 290.7 ptxas info : Compile time = 0.571 ms 2025-09-07T06:29:08.2011402Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:08.2021810Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2027417Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2028586Z #22 290.7 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:08.2029585Z #22 290.7 ptxas info : Compile time = 0.566 ms 2025-09-07T06:29:08.2035102Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:08.2045271Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2050706Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2051875Z #22 290.7 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:29:08.2053035Z #22 290.7 ptxas info : Compile time = 0.555 ms 2025-09-07T06:29:08.2058561Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:08.2068588Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2074267Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2075381Z #22 290.7 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:08.2076339Z #22 290.7 ptxas info : Compile time = 0.531 ms 2025-09-07T06:29:08.2081791Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:08.2091770Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2097664Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2099086Z #22 290.7 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:08.2100101Z #22 290.7 ptxas info : Compile time = 0.526 ms 2025-09-07T06:29:08.2105615Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:08.2115752Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2121280Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2122415Z #22 290.7 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:08.2123416Z #22 290.7 ptxas info : Compile time = 0.524 ms 2025-09-07T06:29:08.2128746Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:08.2138641Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2144004Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2145142Z #22 290.7 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:29:08.2146480Z #22 290.7 ptxas info : Compile time = 0.607 ms 2025-09-07T06:29:08.2151853Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:08.2161574Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2166929Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2168120Z #22 290.7 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:08.2169071Z #22 290.7 ptxas info : Compile time = 0.595 ms 2025-09-07T06:29:08.2171802Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:08.2176316Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:29:08.2178971Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2180147Z #22 290.7 ptxas info : Used 105 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:29:08.2181153Z #22 290.7 ptxas info : Compile time = 56.112 ms 2025-09-07T06:29:08.2186783Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:08.2197119Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2202592Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2203756Z #22 290.7 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:08.2204738Z #22 290.7 ptxas info : Compile time = 1.006 ms 2025-09-07T06:29:08.2209814Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x80x16_F32F16F16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:08.2219198Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x80x16_F32F16F16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:29:08.2224589Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2225773Z #22 290.7 ptxas info : Used 30 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:29:08.2226789Z #22 290.7 ptxas info : Compile time = 21.740 ms 2025-09-07T06:29:08.2231922Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:08.2241236Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:29:08.2246641Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2247787Z #22 290.7 ptxas info : Used 29 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:29:08.2248799Z #22 290.7 ptxas info : Compile time = 18.172 ms 2025-09-07T06:29:08.2254526Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:08.2264696Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2270149Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2271299Z #22 290.7 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:08.2272282Z #22 290.7 ptxas info : Compile time = 0.914 ms 2025-09-07T06:29:08.2274889Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:08.2279215Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:29:08.2281917Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2283061Z #22 290.7 ptxas info : Used 103 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:29:08.2284049Z #22 290.7 ptxas info : Compile time = 105.420 ms 2025-09-07T06:29:08.2284802Z #22 290.7 ptxas info : 10 bytes gmem 2025-09-07T06:29:08.2290217Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:08.2300925Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2306496Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2307552Z #22 290.7 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:29:08.2308443Z #22 290.7 ptxas info : Compile time = 342.158 ms 2025-09-07T06:29:08.2314258Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:08.2324354Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2329870Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2330886Z #22 290.7 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:29:08.2331781Z #22 290.7 ptxas info : Compile time = 342.699 ms 2025-09-07T06:29:08.2337408Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:08.2347352Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2352832Z #22 290.7 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:29:08.2354101Z #22 290.7 ptxas info : Used 168 registers, used 10 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:08.2355182Z #22 290.7 ptxas info : Compile time = 482.170 ms 2025-09-07T06:29:08.2360785Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:08.2371253Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2376985Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2378042Z #22 290.7 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:29:08.2378936Z #22 290.7 ptxas info : Compile time = 379.120 ms 2025-09-07T06:29:08.2384468Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:08.2395055Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2400579Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2401588Z #22 290.7 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:29:08.2402456Z #22 290.7 ptxas info : Compile time = 420.806 ms 2025-09-07T06:29:08.2408024Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:08.2418115Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2423660Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2424739Z #22 290.7 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:29:08.2425620Z #22 290.7 ptxas info : Compile time = 385.786 ms 2025-09-07T06:29:08.2431153Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:08.2441277Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2448750Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2449832Z #22 290.7 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:29:08.2450674Z #22 290.7 ptxas info : Compile time = 508.671 ms 2025-09-07T06:29:08.2456319Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:08.2466345Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2472052Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2473099Z #22 290.7 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:29:08.2473952Z #22 290.7 ptxas info : Compile time = 415.871 ms 2025-09-07T06:29:08.2479376Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:08.2489239Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb0ELb1ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2495150Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2496210Z #22 290.7 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:29:08.2497043Z #22 290.7 ptxas info : Compile time = 417.043 ms 2025-09-07T06:29:08.2502405Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:08.2512175Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2517567Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2518651Z #22 290.7 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:29:08.2519852Z #22 290.7 ptxas info : Compile time = 375.250 ms 2025-09-07T06:29:08.2522504Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:08.2526819Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:29:08.2529513Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2530559Z #22 290.7 ptxas info : Used 106 registers, used 0 barriers 2025-09-07T06:29:08.2531426Z #22 290.7 ptxas info : Compile time = 58.075 ms 2025-09-07T06:29:08.2537148Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:08.2547493Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_21CollectiveEpilogueBwdISD_SE_SG_Li256ELb1ELb1ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2553107Z #22 290.7 16 bytes stack frame, 20 bytes spill stores, 20 bytes spill loads 2025-09-07T06:29:08.2554407Z #22 290.7 ptxas info : Used 168 registers, used 10 barriers, 16 bytes cumulative stack size 2025-09-07T06:29:08.2555567Z #22 290.7 ptxas info : Compile time = 482.282 ms 2025-09-07T06:29:08.2560765Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x80x16_F32F16F16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:08.2569966Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x80x16_F32F16F16_SSILNSF_5MajorE1ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:29:08.2575226Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2576268Z #22 290.7 ptxas info : Used 109 registers, used 1 barriers 2025-09-07T06:29:08.2577142Z #22 290.7 ptxas info : Compile time = 45.398 ms 2025-09-07T06:29:08.2582257Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:08.2591585Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:29:08.2597117Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2598183Z #22 290.7 ptxas info : Used 89 registers, used 1 barriers 2025-09-07T06:29:08.2599050Z #22 290.7 ptxas info : Compile time = 37.831 ms 2025-09-07T06:29:08.2604581Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:08.2615032Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi1ELi1EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi80EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELi2ELi1ELi1ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISD_fSG_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:08.2620599Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2621644Z #22 290.7 ptxas info : Used 168 registers, used 10 barriers 2025-09-07T06:29:08.2622492Z #22 290.7 ptxas info : Compile time = 395.024 ms 2025-09-07T06:29:08.2625099Z #22 290.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:08.2629418Z #22 290.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:29:08.2632050Z #22 290.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:08.2633088Z #22 290.7 ptxas info : Used 96 registers, used 0 barriers 2025-09-07T06:29:08.2633985Z #22 290.7 ptxas info : Compile time = 68.643 ms 2025-09-07T06:29:14.7277959Z #22 297.4 [13/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_fp16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_fp16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_fp16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:29:14.8822837Z #22 297.4 ptxas info : 130 bytes gmem, 104 bytes cmem[4] 2025-09-07T06:29:14.8827747Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.8836828Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.8841880Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.8842911Z #22 297.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:29:14.8843790Z #22 297.4 ptxas info : Compile time = 1.901 ms 2025-09-07T06:29:14.8848531Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.8857427Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.8862239Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.8863275Z #22 297.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:14.8864179Z #22 297.4 ptxas info : Compile time = 0.881 ms 2025-09-07T06:29:14.8866341Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.8869667Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:29:14.8871714Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.8872636Z #22 297.4 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:29:14.8873416Z #22 297.4 ptxas info : Compile time = 44.099 ms 2025-09-07T06:29:14.8877635Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.8885710Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.8890146Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.8891304Z #22 297.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:14.8892674Z #22 297.4 ptxas info : Compile time = 0.891 ms 2025-09-07T06:29:14.8897059Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x80x16_F32F16F16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.8904458Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x80x16_F32F16F16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:29:14.8908622Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.8909604Z #22 297.4 ptxas info : Used 30 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:29:14.8910462Z #22 297.4 ptxas info : Compile time = 15.849 ms 2025-09-07T06:29:14.8914866Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.8922767Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.8927146Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.8928148Z #22 297.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:14.8929018Z #22 297.4 ptxas info : Compile time = 0.855 ms 2025-09-07T06:29:14.8931338Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.8935244Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:29:14.8937575Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.8939063Z #22 297.4 ptxas info : Used 72 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:29:14.8939917Z #22 297.4 ptxas info : Compile time = 42.697 ms 2025-09-07T06:29:14.8944668Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.8953475Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.8958282Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.8959329Z #22 297.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:29:14.8960452Z #22 297.4 ptxas info : Compile time = 0.792 ms 2025-09-07T06:29:14.8965227Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.8974230Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.8979113Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.8980167Z #22 297.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:14.8981076Z #22 297.4 ptxas info : Compile time = 0.647 ms 2025-09-07T06:29:14.8985912Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.9046999Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.9052331Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.9053478Z #22 297.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:14.9054949Z #22 297.4 ptxas info : Compile time = 0.529 ms 2025-09-07T06:29:14.9059695Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.9068663Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.9073472Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.9074503Z #22 297.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:14.9075370Z #22 297.4 ptxas info : Compile time = 0.557 ms 2025-09-07T06:29:14.9080298Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.9088782Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.9094014Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.9095097Z #22 297.4 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:29:14.9096054Z #22 297.4 ptxas info : Compile time = 0.501 ms 2025-09-07T06:29:14.9100060Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.9106838Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.9110707Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.9111559Z #22 297.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:14.9112290Z #22 297.4 ptxas info : Compile time = 0.564 ms 2025-09-07T06:29:14.9114195Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.9117408Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:29:14.9120041Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.9121080Z #22 297.4 ptxas info : Used 62 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:29:14.9122008Z #22 297.4 ptxas info : Compile time = 34.025 ms 2025-09-07T06:29:14.9126666Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.9133846Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.9137644Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.9138817Z #22 297.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:14.9139518Z #22 297.4 ptxas info : Compile time = 0.869 ms 2025-09-07T06:29:14.9142962Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA26MMA_64x128x16_F32F16F16_RSILNSE_5MajorE0ELSG_1ELNSE_7ScaleInE1ELSH_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESM_EEENS4_IJSM_NS5_ILi0EEESO_EEEEENS4_IJNS3_10UnderscoreESR_SR_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.9149459Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA26MMA_64x128x16_F32F16F16_RSILNSE_5MajorE0ELSG_1ELNSE_7ScaleInE1ELSH_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESM_EEENS4_IJSM_NS5_ILi0EEESO_EEEEENS4_IJNS3_10UnderscoreESR_SR_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:29:14.9154599Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.9155750Z #22 297.4 ptxas info : Used 31 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:29:14.9156518Z #22 297.4 ptxas info : Compile time = 19.712 ms 2025-09-07T06:29:14.9160013Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.9166452Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:29:14.9170024Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.9170860Z #22 297.4 ptxas info : Used 30 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:29:14.9171559Z #22 297.4 ptxas info : Compile time = 14.061 ms 2025-09-07T06:29:14.9176406Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.9183637Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.9187537Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.9188388Z #22 297.4 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:14.9189125Z #22 297.4 ptxas info : Compile time = 0.858 ms 2025-09-07T06:29:14.9191203Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:14.9194571Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:29:14.9196471Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.9197297Z #22 297.4 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:29:14.9198032Z #22 297.4 ptxas info : Compile time = 36.057 ms 2025-09-07T06:29:14.9198592Z #22 297.4 ptxas info : 10 bytes gmem 2025-09-07T06:29:14.9202559Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9209830Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.9213975Z #22 297.4 8 bytes stack frame, 4 bytes spill stores, 8 bytes spill loads 2025-09-07T06:29:14.9214964Z #22 297.4 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:14.9215842Z #22 297.4 ptxas info : Compile time = 433.836 ms 2025-09-07T06:29:14.9219993Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9227463Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.9231969Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.9232768Z #22 297.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:14.9233479Z #22 297.4 ptxas info : Compile time = 391.836 ms 2025-09-07T06:29:14.9235534Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9238863Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:29:14.9240930Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.9241761Z #22 297.4 ptxas info : Used 70 registers, used 0 barriers 2025-09-07T06:29:14.9242436Z #22 297.4 ptxas info : Compile time = 36.054 ms 2025-09-07T06:29:14.9246764Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9254200Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.9258638Z #22 297.4 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:29:14.9259591Z #22 297.4 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:14.9260548Z #22 297.4 ptxas info : Compile time = 504.549 ms 2025-09-07T06:29:14.9265366Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x80x16_F32F16F16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9273274Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x80x16_F32F16F16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:29:14.9277967Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.9279012Z #22 297.4 ptxas info : Used 64 registers, used 1 barriers 2025-09-07T06:29:14.9279870Z #22 297.4 ptxas info : Compile time = 25.294 ms 2025-09-07T06:29:14.9284137Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9296485Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.9301589Z #22 297.4 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:29:14.9302775Z #22 297.4 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:14.9303897Z #22 297.4 ptxas info : Compile time = 391.875 ms 2025-09-07T06:29:14.9306361Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9310564Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:29:14.9313274Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.9314308Z #22 297.4 ptxas info : Used 72 registers, used 0 barriers 2025-09-07T06:29:14.9315157Z #22 297.4 ptxas info : Compile time = 28.263 ms 2025-09-07T06:29:14.9320490Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9330062Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.9335697Z #22 297.4 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:29:14.9337017Z #22 297.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:29:14.9338133Z #22 297.4 ptxas info : Compile time = 354.357 ms 2025-09-07T06:29:14.9343339Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9352945Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.9358262Z #22 297.4 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:29:14.9359854Z #22 297.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:29:14.9360897Z #22 297.4 ptxas info : Compile time = 404.120 ms 2025-09-07T06:29:14.9366206Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9376033Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.9381430Z #22 297.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:29:14.9382705Z #22 297.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:29:14.9384027Z #22 297.4 ptxas info : Compile time = 514.536 ms 2025-09-07T06:29:14.9389385Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9399040Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.9404461Z #22 297.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:29:14.9405713Z #22 297.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:29:14.9406841Z #22 297.4 ptxas info : Compile time = 416.776 ms 2025-09-07T06:29:14.9411946Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9422244Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.9427311Z #22 297.4 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:29:14.9428578Z #22 297.4 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:14.9429685Z #22 297.4 ptxas info : Compile time = 430.345 ms 2025-09-07T06:29:14.9434838Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9445529Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.9451582Z #22 297.4 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:29:14.9452975Z #22 297.4 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:14.9454076Z #22 297.4 ptxas info : Compile time = 266.107 ms 2025-09-07T06:29:14.9456377Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9460816Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:29:14.9463463Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.9464455Z #22 297.4 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:29:14.9465309Z #22 297.4 ptxas info : Compile time = 21.996 ms 2025-09-07T06:29:14.9470665Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9480518Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.9485966Z #22 297.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:29:14.9487276Z #22 297.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:29:14.9488405Z #22 297.4 ptxas info : Compile time = 317.225 ms 2025-09-07T06:29:14.9493572Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA26MMA_64x128x16_F32F16F16_RSILNSE_5MajorE0ELSG_1ELNSE_7ScaleInE1ELSH_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESM_EEENS4_IJSM_NS5_ILi0EEESO_EEEEENS4_IJNS3_10UnderscoreESR_SR_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9501446Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA26MMA_64x128x16_F32F16F16_RSILNSE_5MajorE0ELSG_1ELNSE_7ScaleInE1ELSH_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESM_EEENS4_IJSM_NS5_ILi0EEESO_EEEEENS4_IJNS3_10UnderscoreESR_SR_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:29:14.9506563Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.9507337Z #22 297.4 ptxas info : Used 88 registers, used 1 barriers 2025-09-07T06:29:14.9507951Z #22 297.4 ptxas info : Compile time = 23.315 ms 2025-09-07T06:29:14.9511931Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9519437Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA25MMA_64x64x16_F32F16F16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:29:14.9523531Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.9524601Z #22 297.4 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:29:14.9525293Z #22 297.4 ptxas info : Compile time = 16.014 ms 2025-09-07T06:29:14.9529566Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9538642Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:14.9543434Z #22 297.4 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:29:14.9545316Z #22 297.4 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:29:14.9546360Z #22 297.4 ptxas info : Compile time = 261.154 ms 2025-09-07T06:29:14.9549173Z #22 297.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:14.9553036Z #22 297.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:29:14.9555817Z #22 297.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:14.9556803Z #22 297.4 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:29:14.9557618Z #22 297.4 ptxas info : Compile time = 24.941 ms 2025-09-07T06:29:17.8412876Z #22 300.5 [14/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:29:17.8431817Z #22 300.5 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:29:17.8436653Z #22 300.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:17.8445485Z #22 300.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:17.8450319Z #22 300.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:17.8451407Z #22 300.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:17.8452337Z #22 300.5 ptxas info : Compile time = 1.852 ms 2025-09-07T06:29:17.8457811Z #22 300.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:17.8467032Z #22 300.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:17.8472175Z #22 300.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:17.8473191Z #22 300.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:17.8474343Z #22 300.5 ptxas info : Compile time = 0.902 ms 2025-09-07T06:29:17.8479513Z #22 300.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:17.8488891Z #22 300.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:17.8494370Z #22 300.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:17.8495387Z #22 300.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:17.8496285Z #22 300.5 ptxas info : Compile time = 21.059 ms 2025-09-07T06:29:17.8501015Z #22 300.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:17.8509656Z #22 300.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:17.8514358Z #22 300.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:17.8515414Z #22 300.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:17.8516342Z #22 300.5 ptxas info : Compile time = 0.726 ms 2025-09-07T06:29:17.8521480Z #22 300.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:17.8530968Z #22 300.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:17.8536283Z #22 300.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:17.8537321Z #22 300.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:17.8538234Z #22 300.5 ptxas info : Compile time = 0.619 ms 2025-09-07T06:29:17.8543387Z #22 300.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:17.8552992Z #22 300.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:17.8558163Z #22 300.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:17.8559200Z #22 300.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:17.8560079Z #22 300.5 ptxas info : Compile time = 0.577 ms 2025-09-07T06:29:17.8565026Z #22 300.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:17.8573784Z #22 300.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:17.8578440Z #22 300.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:17.8579460Z #22 300.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:17.8580321Z #22 300.5 ptxas info : Compile time = 0.567 ms 2025-09-07T06:29:17.8585025Z #22 300.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:17.8594565Z #22 300.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:17.8599701Z #22 300.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:17.8600691Z #22 300.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:17.8601581Z #22 300.5 ptxas info : Compile time = 0.533 ms 2025-09-07T06:29:17.8606644Z #22 300.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:17.8616595Z #22 300.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:17.8621589Z #22 300.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:17.8622618Z #22 300.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:17.8623496Z #22 300.5 ptxas info : Compile time = 0.578 ms 2025-09-07T06:29:17.8624168Z #22 300.5 ptxas info : 10 bytes gmem 2025-09-07T06:29:17.8629049Z #22 300.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:17.8637399Z #22 300.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:17.8642107Z #22 300.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:17.8643052Z #22 300.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:17.8643844Z #22 300.5 ptxas info : Compile time = 395.232 ms 2025-09-07T06:29:17.8649135Z #22 300.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:17.8658545Z #22 300.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:17.8663325Z #22 300.5 8 bytes stack frame, 8 bytes spill stores, 12 bytes spill loads 2025-09-07T06:29:17.8664445Z #22 300.5 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:17.8665459Z #22 300.5 ptxas info : Compile time = 538.297 ms 2025-09-07T06:29:17.8670437Z #22 300.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:17.8679794Z #22 300.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:17.8684991Z #22 300.5 40 bytes stack frame, 76 bytes spill stores, 84 bytes spill loads 2025-09-07T06:29:17.8686165Z #22 300.5 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:29:17.8687187Z #22 300.5 ptxas info : Compile time = 1019.137 ms 2025-09-07T06:29:17.8692319Z #22 300.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:17.8701215Z #22 300.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:17.8705735Z #22 300.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:17.8706686Z #22 300.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:17.8707482Z #22 300.5 ptxas info : Compile time = 847.537 ms 2025-09-07T06:29:17.8712698Z #22 300.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:17.8722179Z #22 300.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:17.8727344Z #22 300.5 8 bytes stack frame, 8 bytes spill stores, 12 bytes spill loads 2025-09-07T06:29:17.8728485Z #22 300.5 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:17.8729428Z #22 300.5 ptxas info : Compile time = 1029.681 ms 2025-09-07T06:29:17.8735338Z #22 300.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:17.8746012Z #22 300.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:17.8751485Z #22 300.5 48 bytes stack frame, 100 bytes spill stores, 124 bytes spill loads 2025-09-07T06:29:17.8753313Z #22 300.5 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:29:17.8754325Z #22 300.5 ptxas info : Compile time = 2111.736 ms 2025-09-07T06:29:17.8759038Z #22 300.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:17.9895704Z #22 300.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:17.9900200Z #22 300.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:17.9901042Z #22 300.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:17.9901738Z #22 300.5 ptxas info : Compile time = 900.522 ms 2025-09-07T06:29:17.9906556Z #22 300.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:17.9915095Z #22 300.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:17.9919824Z #22 300.5 8 bytes stack frame, 8 bytes spill stores, 12 bytes spill loads 2025-09-07T06:29:17.9920902Z #22 300.5 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:17.9921829Z #22 300.5 ptxas info : Compile time = 1180.288 ms 2025-09-07T06:29:17.9926663Z #22 300.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:17.9936028Z #22 300.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:17.9941349Z #22 300.5 40 bytes stack frame, 84 bytes spill stores, 88 bytes spill loads 2025-09-07T06:29:17.9942480Z #22 300.5 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:29:17.9943412Z #22 300.5 ptxas info : Compile time = 2189.211 ms 2025-09-07T06:29:19.3672272Z #22 302.0 [15/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_bf16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_bf16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_bf16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:29:19.3692975Z #22 302.0 ptxas info : 130 bytes gmem, 104 bytes cmem[4] 2025-09-07T06:29:19.3698445Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.3708349Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.3713724Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.3714892Z #22 302.0 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:29:19.3715784Z #22 302.0 ptxas info : Compile time = 1.938 ms 2025-09-07T06:29:19.3721222Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.3731482Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.3737673Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.3739140Z #22 302.0 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:19.3740524Z #22 302.0 ptxas info : Compile time = 0.885 ms 2025-09-07T06:29:19.3744535Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.3749172Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:29:19.3751842Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.3752996Z #22 302.0 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:29:19.3753987Z #22 302.0 ptxas info : Compile time = 43.501 ms 2025-09-07T06:29:19.3759317Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.3769217Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.3774692Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.3775852Z #22 302.0 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:19.3776831Z #22 302.0 ptxas info : Compile time = 0.833 ms 2025-09-07T06:29:19.3781992Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x80x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.3791408Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x80x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:29:19.3797134Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.3801911Z #22 302.0 ptxas info : Used 30 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:29:19.3802920Z #22 302.0 ptxas info : Compile time = 15.851 ms 2025-09-07T06:29:19.3808255Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.3818332Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.3823706Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.3825174Z #22 302.0 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:19.3826176Z #22 302.0 ptxas info : Compile time = 0.855 ms 2025-09-07T06:29:19.3828806Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.3833201Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:29:19.3835911Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.3837065Z #22 302.0 ptxas info : Used 72 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:29:19.3838078Z #22 302.0 ptxas info : Compile time = 44.501 ms 2025-09-07T06:29:19.3843570Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.3853777Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.3859224Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.3860383Z #22 302.0 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:29:19.3861389Z #22 302.0 ptxas info : Compile time = 0.888 ms 2025-09-07T06:29:19.3866698Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.3876811Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.3882285Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.3883440Z #22 302.0 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:19.3884435Z #22 302.0 ptxas info : Compile time = 0.690 ms 2025-09-07T06:29:19.3889946Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.3904065Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.3909595Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.3910693Z #22 302.0 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:19.3911672Z #22 302.0 ptxas info : Compile time = 0.616 ms 2025-09-07T06:29:19.3917122Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.3927099Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.3932721Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.3933882Z #22 302.0 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:19.3934892Z #22 302.0 ptxas info : Compile time = 0.654 ms 2025-09-07T06:29:19.3940239Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.3949924Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.3956222Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.3957360Z #22 302.0 ptxas info : Used 4 registers, used 0 barriers, 2176 bytes cmem[0] 2025-09-07T06:29:19.3958295Z #22 302.0 ptxas info : Compile time = 0.640 ms 2025-09-07T06:29:19.3964313Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.3974817Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.3979956Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.3981119Z #22 302.0 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:19.3982056Z #22 302.0 ptxas info : Compile time = 0.585 ms 2025-09-07T06:29:19.3984700Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.3988930Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:29:19.3991631Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.3993033Z #22 302.0 ptxas info : Used 62 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:29:19.3994027Z #22 302.0 ptxas info : Compile time = 33.566 ms 2025-09-07T06:29:19.3999439Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.4009498Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.4015097Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.4016247Z #22 302.0 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:19.4017225Z #22 302.0 ptxas info : Compile time = 0.843 ms 2025-09-07T06:29:19.4022297Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA28MMA_64x128x16_F32BF16BF16_RSILNSE_5MajorE0ELSG_1ELNSE_7ScaleInE1ELSH_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESM_EEENS4_IJSM_NS5_ILi0EEESO_EEEEENS4_IJNS3_10UnderscoreESR_SR_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.4031770Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA28MMA_64x128x16_F32BF16BF16_RSILNSE_5MajorE0ELSG_1ELNSE_7ScaleInE1ELSH_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESM_EEENS4_IJSM_NS5_ILi0EEESO_EEEEENS4_IJNS3_10UnderscoreESR_SR_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:29:19.4036761Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.4037904Z #22 302.0 ptxas info : Used 31 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:29:19.4038916Z #22 302.0 ptxas info : Compile time = 19.749 ms 2025-09-07T06:29:19.4044323Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.4053792Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:29:19.4058957Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.4060134Z #22 302.0 ptxas info : Used 30 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:29:19.4061135Z #22 302.0 ptxas info : Compile time = 14.143 ms 2025-09-07T06:29:19.4066643Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.4076600Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.4081983Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.4083144Z #22 302.0 ptxas info : Used 4 registers, used 0 barriers, 1728 bytes cmem[0] 2025-09-07T06:29:19.4084156Z #22 302.0 ptxas info : Compile time = 0.863 ms 2025-09-07T06:29:19.4086775Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:19.4091157Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:29:19.4094246Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.4095682Z #22 302.0 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:29:19.4096665Z #22 302.0 ptxas info : Compile time = 37.092 ms 2025-09-07T06:29:19.4097388Z #22 302.0 ptxas info : 10 bytes gmem 2025-09-07T06:29:19.4102807Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4112760Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.4118233Z #22 302.0 8 bytes stack frame, 4 bytes spill stores, 8 bytes spill loads 2025-09-07T06:29:19.4119673Z #22 302.0 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:19.4120793Z #22 302.0 ptxas info : Compile time = 427.741 ms 2025-09-07T06:29:19.4126248Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4136427Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.4141927Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.4142951Z #22 302.0 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:19.4143819Z #22 302.0 ptxas info : Compile time = 432.528 ms 2025-09-07T06:29:19.4146501Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4150874Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:29:19.4153495Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.4154535Z #22 302.0 ptxas info : Used 70 registers, used 0 barriers 2025-09-07T06:29:19.4155388Z #22 302.0 ptxas info : Compile time = 42.973 ms 2025-09-07T06:29:19.4160847Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4170908Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.4176623Z #22 302.0 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:29:19.4177937Z #22 302.0 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:19.4179056Z #22 302.0 ptxas info : Compile time = 551.971 ms 2025-09-07T06:29:19.4184285Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x80x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4295032Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x80x16_F32BF16BF16_SSILNSF_5MajorE1ELSH_0ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESN_EEENS4_IJSN_NS5_ILi0EEESP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:29:19.4300321Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.4301358Z #22 302.0 ptxas info : Used 64 registers, used 1 barriers 2025-09-07T06:29:19.4302221Z #22 302.0 ptxas info : Compile time = 28.737 ms 2025-09-07T06:29:19.4307773Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4317688Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi80EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.4323153Z #22 302.0 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:29:19.4324419Z #22 302.0 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:19.4325550Z #22 302.0 ptxas info : Compile time = 455.687 ms 2025-09-07T06:29:19.4328240Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4332774Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi80EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:29:19.4335408Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.4336425Z #22 302.0 ptxas info : Used 72 registers, used 0 barriers 2025-09-07T06:29:19.4337283Z #22 302.0 ptxas info : Compile time = 47.094 ms 2025-09-07T06:29:19.4343007Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4353064Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.4358570Z #22 302.0 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:29:19.4359866Z #22 302.0 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:29:19.4360991Z #22 302.0 ptxas info : Compile time = 426.521 ms 2025-09-07T06:29:19.4366573Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4376848Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.4382357Z #22 302.0 40 bytes stack frame, 48 bytes spill stores, 72 bytes spill loads 2025-09-07T06:29:19.4383702Z #22 302.0 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:29:19.4384781Z #22 302.0 ptxas info : Compile time = 390.343 ms 2025-09-07T06:29:19.4390317Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4400718Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.4406180Z #22 302.0 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:29:19.4407484Z #22 302.0 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:29:19.4408621Z #22 302.0 ptxas info : Compile time = 517.254 ms 2025-09-07T06:29:19.4414288Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4424513Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.4429978Z #22 302.0 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:29:19.4431306Z #22 302.0 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:29:19.4432451Z #22 302.0 ptxas info : Compile time = 420.473 ms 2025-09-07T06:29:19.4439607Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4449887Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb0ELb0ELi1EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.4455170Z #22 302.0 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:29:19.4456874Z #22 302.0 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:19.4457994Z #22 302.0 ptxas info : Compile time = 426.196 ms 2025-09-07T06:29:19.4463250Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4473795Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.4479004Z #22 302.0 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:29:19.4480300Z #22 302.0 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:19.4481422Z #22 302.0 ptxas info : Compile time = 391.583 ms 2025-09-07T06:29:19.4484094Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4488477Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:29:19.4491381Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.4492771Z #22 302.0 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:29:19.4493628Z #22 302.0 ptxas info : Compile time = 36.567 ms 2025-09-07T06:29:19.4499172Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4509177Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_21CollectiveEpilogueBwdISC_SD_SF_Li256ELb1ELb0ELi1EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.4514655Z #22 302.0 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:29:19.4516186Z #22 302.0 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:29:19.4517309Z #22 302.0 ptxas info : Compile time = 417.692 ms 2025-09-07T06:29:19.4522375Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA28MMA_64x128x16_F32BF16BF16_RSILNSE_5MajorE0ELSG_1ELNSE_7ScaleInE1ELSH_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESM_EEENS4_IJSM_NS5_ILi0EEESO_EEEEENS4_IJNS3_10UnderscoreESR_SR_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4533050Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA28MMA_64x128x16_F32BF16BF16_RSILNSE_5MajorE0ELSG_1ELNSE_7ScaleInE1ELSH_1EEEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi1EEESM_EEENS4_IJSM_NS5_ILi0EEESO_EEEEENS4_IJNS3_10UnderscoreESR_SR_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:29:19.4540179Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.4541210Z #22 302.0 ptxas info : Used 88 registers, used 1 barriers 2025-09-07T06:29:19.4542064Z #22 302.0 ptxas info : Compile time = 39.304 ms 2025-09-07T06:29:19.4547264Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4556745Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_4SM904GMMA27MMA_64x64x16_F32BF16BF16_SSILNSF_5MajorE0ELSH_1ELNSF_7ScaleInE1ELSI_1EEEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi2EEESM_EEENS4_IJNS5_ILi0EEESM_SP_EEEEENS4_IJNS3_10UnderscoreESS_SS_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:29:19.4561951Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.4562970Z #22 302.0 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:29:19.4563820Z #22 302.0 ptxas info : Compile time = 25.740 ms 2025-09-07T06:29:19.4569640Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4579727Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnBwdSm90INS1_25CollectiveMainloopBwdSm90ILi2ELi2ELi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi64EEENS7_ILi128EEESB_EEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ELi2ELi1ELi2ELi1ELb0EEENS1_24CollectiveEpilogueBwdGQAISC_fSF_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:19.4585064Z #22 302.0 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:29:19.4586364Z #22 302.0 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:29:19.4587498Z #22 302.0 ptxas info : Compile time = 423.430 ms 2025-09-07T06:29:19.4590419Z #22 302.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:19.4594792Z #22 302.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:29:19.4597472Z #22 302.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:19.4598530Z #22 302.0 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:29:19.4599382Z #22 302.0 ptxas info : Compile time = 41.250 ms 2025-09-07T06:29:22.8306629Z #22 305.5 [16/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:29:22.9927463Z #22 305.5 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:29:22.9930890Z #22 305.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:22.9937170Z #22 305.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:22.9940551Z #22 305.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:22.9941322Z #22 305.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:22.9941982Z #22 305.5 ptxas info : Compile time = 68.622 ms 2025-09-07T06:29:22.9945995Z #22 305.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:22.9952700Z #22 305.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:22.9956324Z #22 305.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:22.9957100Z #22 305.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:22.9957759Z #22 305.5 ptxas info : Compile time = 1.161 ms 2025-09-07T06:29:22.9961351Z #22 305.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:22.9967954Z #22 305.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:22.9971648Z #22 305.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:22.9972617Z #22 305.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:22.9973260Z #22 305.5 ptxas info : Compile time = 0.995 ms 2025-09-07T06:29:22.9978002Z #22 305.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:22.9989552Z #22 305.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:23.0002376Z #22 305.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:23.0003520Z #22 305.5 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:29:23.0004522Z #22 305.5 ptxas info : Compile time = 0.775 ms 2025-09-07T06:29:23.0010306Z #22 305.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:23.0020442Z #22 305.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:23.0025996Z #22 305.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:23.0027144Z #22 305.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:23.0028136Z #22 305.5 ptxas info : Compile time = 0.666 ms 2025-09-07T06:29:23.0033840Z #22 305.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:23.0044103Z #22 305.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:23.0049776Z #22 305.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:23.0050948Z #22 305.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:23.0051937Z #22 305.5 ptxas info : Compile time = 0.622 ms 2025-09-07T06:29:23.0057551Z #22 305.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:23.0067828Z #22 305.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:23.0073194Z #22 305.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:23.0074312Z #22 305.5 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:29:23.0075289Z #22 305.5 ptxas info : Compile time = 0.612 ms 2025-09-07T06:29:23.0081034Z #22 305.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:23.0091562Z #22 305.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:23.0097361Z #22 305.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:23.0098518Z #22 305.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:23.0099484Z #22 305.5 ptxas info : Compile time = 0.600 ms 2025-09-07T06:29:23.0105203Z #22 305.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:23.0116803Z #22 305.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:23.0122332Z #22 305.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:23.0123458Z #22 305.5 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:23.0124444Z #22 305.5 ptxas info : Compile time = 0.616 ms 2025-09-07T06:29:23.0125163Z #22 305.5 ptxas info : 10 bytes gmem 2025-09-07T06:29:23.0130244Z #22 305.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:23.0140187Z #22 305.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:23.0145417Z #22 305.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:23.0146462Z #22 305.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:23.0147348Z #22 305.5 ptxas info : Compile time = 634.480 ms 2025-09-07T06:29:23.0153039Z #22 305.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:23.0163564Z #22 305.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:23.0169301Z #22 305.5 8 bytes stack frame, 4 bytes spill stores, 8 bytes spill loads 2025-09-07T06:29:23.0170607Z #22 305.5 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:23.0171758Z #22 305.5 ptxas info : Compile time = 770.426 ms 2025-09-07T06:29:23.0177689Z #22 305.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:23.0188266Z #22 305.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:23.0194382Z #22 305.5 32 bytes stack frame, 68 bytes spill stores, 84 bytes spill loads 2025-09-07T06:29:23.0195851Z #22 305.5 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:29:23.0197164Z #22 305.5 ptxas info : Compile time = 1477.979 ms 2025-09-07T06:29:23.0203737Z #22 305.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:23.0215099Z #22 305.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:23.0221301Z #22 305.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:23.0222391Z #22 305.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:23.0223267Z #22 305.5 ptxas info : Compile time = 1391.973 ms 2025-09-07T06:29:23.0229091Z #22 305.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:23.0240223Z #22 305.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:23.0246222Z #22 305.5 16 bytes stack frame, 12 bytes spill stores, 16 bytes spill loads 2025-09-07T06:29:23.0247518Z #22 305.5 ptxas info : Used 168 registers, used 16 barriers, 16 bytes cumulative stack size 2025-09-07T06:29:23.0248677Z #22 305.5 ptxas info : Compile time = 1424.955 ms 2025-09-07T06:29:23.0253229Z #22 305.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:23.0259805Z #22 305.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:23.0263407Z #22 305.5 40 bytes stack frame, 92 bytes spill stores, 128 bytes spill loads 2025-09-07T06:29:23.0264265Z #22 305.5 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:29:23.0264994Z #22 305.5 ptxas info : Compile time = 1997.856 ms 2025-09-07T06:29:23.0268532Z #22 305.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:23.0274914Z #22 305.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:23.0278740Z #22 305.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:23.0279421Z #22 305.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:23.0279994Z #22 305.5 ptxas info : Compile time = 1153.865 ms 2025-09-07T06:29:23.0296174Z #22 305.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:23.0303196Z #22 305.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:23.0308703Z #22 305.5 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:29:23.0309732Z #22 305.5 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:23.0310973Z #22 305.5 ptxas info : Compile time = 958.052 ms 2025-09-07T06:29:23.0317692Z #22 305.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:23.0329638Z #22 305.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:23.0335683Z #22 305.5 32 bytes stack frame, 64 bytes spill stores, 76 bytes spill loads 2025-09-07T06:29:23.0337034Z #22 305.5 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:29:23.0338170Z #22 305.5 ptxas info : Compile time = 1308.890 ms 2025-09-07T06:29:41.3585032Z #22 324.0 [17/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:29:41.5192158Z #22 324.0 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:29:41.5198112Z #22 324.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:41.5208055Z #22 324.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:41.5213664Z #22 324.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:41.5214768Z #22 324.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:41.5215755Z #22 324.0 ptxas info : Compile time = 1.872 ms 2025-09-07T06:29:41.5221471Z #22 324.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:41.5231625Z #22 324.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:41.5235263Z #22 324.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:41.5236027Z #22 324.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:41.5236684Z #22 324.0 ptxas info : Compile time = 0.920 ms 2025-09-07T06:29:41.5240247Z #22 324.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:41.5247451Z #22 324.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:41.5251084Z #22 324.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:41.5251853Z #22 324.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:41.5252972Z #22 324.0 ptxas info : Compile time = 30.884 ms 2025-09-07T06:29:41.5258469Z #22 324.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:41.5268570Z #22 324.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:41.5273937Z #22 324.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:41.5275146Z #22 324.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:41.5276138Z #22 324.0 ptxas info : Compile time = 0.848 ms 2025-09-07T06:29:41.5281963Z #22 324.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:41.5292905Z #22 324.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:41.5298767Z #22 324.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:41.5299916Z #22 324.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:41.5300944Z #22 324.0 ptxas info : Compile time = 0.668 ms 2025-09-07T06:29:41.5305035Z #22 324.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:41.5311904Z #22 324.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:41.5315565Z #22 324.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:41.5316328Z #22 324.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:41.5316976Z #22 324.0 ptxas info : Compile time = 0.605 ms 2025-09-07T06:29:41.5320345Z #22 324.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:41.5328708Z #22 324.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:41.5334354Z #22 324.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:41.5335544Z #22 324.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:41.5336554Z #22 324.0 ptxas info : Compile time = 0.554 ms 2025-09-07T06:29:41.5342381Z #22 324.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:41.5352909Z #22 324.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:41.5358769Z #22 324.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:41.5359932Z #22 324.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:41.5360936Z #22 324.0 ptxas info : Compile time = 0.528 ms 2025-09-07T06:29:41.5366677Z #22 324.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:41.5374054Z #22 324.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:41.5377936Z #22 324.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:41.5378687Z #22 324.0 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:41.5379339Z #22 324.0 ptxas info : Compile time = 0.533 ms 2025-09-07T06:29:41.5379820Z #22 324.0 ptxas info : 10 bytes gmem 2025-09-07T06:29:41.5383188Z #22 324.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:41.5389450Z #22 324.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:41.5394308Z #22 324.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:41.5395365Z #22 324.0 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:41.5396239Z #22 324.0 ptxas info : Compile time = 612.044 ms 2025-09-07T06:29:41.5401260Z #22 324.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:41.5411475Z #22 324.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:41.5417143Z #22 324.0 8 bytes stack frame, 8 bytes spill stores, 12 bytes spill loads 2025-09-07T06:29:41.5418429Z #22 324.0 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:41.5419551Z #22 324.0 ptxas info : Compile time = 864.366 ms 2025-09-07T06:29:41.5425219Z #22 324.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:41.5435495Z #22 324.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:41.5441444Z #22 324.0 40 bytes stack frame, 76 bytes spill stores, 84 bytes spill loads 2025-09-07T06:29:41.5442674Z #22 324.0 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:29:41.5443783Z #22 324.0 ptxas info : Compile time = 1744.795 ms 2025-09-07T06:29:41.5448962Z #22 324.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:41.5458342Z #22 324.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:41.5463643Z #22 324.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:41.5464973Z #22 324.0 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:41.5465878Z #22 324.0 ptxas info : Compile time = 1294.647 ms 2025-09-07T06:29:41.5471560Z #22 324.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:41.5481747Z #22 324.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:41.5487404Z #22 324.0 8 bytes stack frame, 8 bytes spill stores, 12 bytes spill loads 2025-09-07T06:29:41.5488672Z #22 324.0 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:41.5489775Z #22 324.0 ptxas info : Compile time = 1648.424 ms 2025-09-07T06:29:41.5495462Z #22 324.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:41.5505705Z #22 324.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:41.5511356Z #22 324.0 48 bytes stack frame, 100 bytes spill stores, 124 bytes spill loads 2025-09-07T06:29:41.5512873Z #22 324.0 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:29:41.5513976Z #22 324.0 ptxas info : Compile time = 2821.114 ms 2025-09-07T06:29:41.5519184Z #22 324.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:41.5528662Z #22 324.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:41.5534034Z #22 324.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:41.5535037Z #22 324.0 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:41.5535895Z #22 324.0 ptxas info : Compile time = 896.727 ms 2025-09-07T06:29:41.5541834Z #22 324.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:41.5552055Z #22 324.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:41.5557680Z #22 324.0 8 bytes stack frame, 8 bytes spill stores, 12 bytes spill loads 2025-09-07T06:29:41.5558955Z #22 324.0 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:41.5560071Z #22 324.0 ptxas info : Compile time = 1201.941 ms 2025-09-07T06:29:41.5565562Z #22 324.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:41.5576019Z #22 324.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:41.5581665Z #22 324.0 40 bytes stack frame, 84 bytes spill stores, 88 bytes spill loads 2025-09-07T06:29:41.5582963Z #22 324.0 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:29:41.5584039Z #22 324.0 ptxas info : Compile time = 2271.343 ms 2025-09-07T06:29:43.0285474Z #22 325.7 [18/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_sm100.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_sm100.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_sm100.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:29:43.1788438Z #22 325.7 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:29:43.1793710Z #22 325.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:43.1801743Z #22 325.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:43.1806634Z #22 325.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:43.1807745Z #22 325.7 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:29:43.1808644Z #22 325.7 ptxas info : Compile time = 1.897 ms 2025-09-07T06:29:43.1814326Z #22 325.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:43.1822848Z #22 325.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:43.1827413Z #22 325.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:43.1828239Z #22 325.7 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:29:43.1828936Z #22 325.7 ptxas info : Compile time = 1.006 ms 2025-09-07T06:29:43.1833483Z #22 325.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:43.1843134Z #22 325.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:43.1848361Z #22 325.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:43.1849411Z #22 325.7 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:29:43.1850271Z #22 325.7 ptxas info : Compile time = 0.806 ms 2025-09-07T06:29:43.1855187Z #22 325.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:43.1864336Z #22 325.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:43.1869627Z #22 325.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:43.1870489Z #22 325.7 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:29:43.1871204Z #22 325.7 ptxas info : Compile time = 0.585 ms 2025-09-07T06:29:43.1875170Z #22 325.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:43.1883064Z #22 325.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:43.1891030Z #22 325.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:43.1892722Z #22 325.7 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:29:43.1893677Z #22 325.7 ptxas info : Compile time = 0.552 ms 2025-09-07T06:29:43.1898829Z #22 325.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:43.1908196Z #22 325.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:43.1913068Z #22 325.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:43.1914164Z #22 325.7 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:29:43.1915221Z #22 325.7 ptxas info : Compile time = 0.552 ms 2025-09-07T06:29:43.1919681Z #22 325.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:43.1927994Z #22 325.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:43.1932711Z #22 325.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:43.1933645Z #22 325.7 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:29:43.1934510Z #22 325.7 ptxas info : Compile time = 0.534 ms 2025-09-07T06:29:43.1939439Z #22 325.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:43.1947415Z #22 325.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:43.1952354Z #22 325.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:43.1953517Z #22 325.7 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:29:43.1954375Z #22 325.7 ptxas info : Compile time = 0.730 ms 2025-09-07T06:29:43.1959632Z #22 325.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:43.1969711Z #22 325.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:43.1975298Z #22 325.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:43.1976339Z #22 325.7 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:29:43.1977276Z #22 325.7 ptxas info : Compile time = 0.540 ms 2025-09-07T06:29:43.1977933Z #22 325.7 ptxas info : 10 bytes gmem 2025-09-07T06:29:43.1982859Z #22 325.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:43.1992410Z #22 325.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:43.1996702Z #22 325.7 8 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:29:43.1997733Z #22 325.7 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:43.1998626Z #22 325.7 ptxas info : Compile time = 757.277 ms 2025-09-07T06:29:43.2003305Z #22 325.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:43.2012013Z #22 325.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:43.2018043Z #22 325.7 40 bytes stack frame, 116 bytes spill stores, 132 bytes spill loads 2025-09-07T06:29:43.2019080Z #22 325.7 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:29:43.2019991Z #22 325.7 ptxas info : Compile time = 935.577 ms 2025-09-07T06:29:43.2024736Z #22 325.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:43.2033718Z #22 325.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:43.2039237Z #22 325.7 152 bytes stack frame, 348 bytes spill stores, 576 bytes spill loads 2025-09-07T06:29:43.2040408Z #22 325.7 ptxas info : Used 168 registers, used 16 barriers, 152 bytes cumulative stack size 2025-09-07T06:29:43.2041398Z #22 325.7 ptxas info : Compile time = 1804.525 ms 2025-09-07T06:29:43.2046199Z #22 325.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:43.2057324Z #22 325.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:43.2062234Z #22 325.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:43.2063148Z #22 325.7 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:43.2063863Z #22 325.7 ptxas info : Compile time = 1241.533 ms 2025-09-07T06:29:43.2068341Z #22 325.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:43.2076691Z #22 325.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:43.2082151Z #22 325.7 32 bytes stack frame, 96 bytes spill stores, 104 bytes spill loads 2025-09-07T06:29:43.2083266Z #22 325.7 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:29:43.2084183Z #22 325.7 ptxas info : Compile time = 1442.161 ms 2025-09-07T06:29:43.2089014Z #22 325.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:43.2098236Z #22 325.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:43.2103711Z #22 325.7 56 bytes stack frame, 228 bytes spill stores, 308 bytes spill loads 2025-09-07T06:29:43.2105249Z #22 325.7 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:29:43.2106308Z #22 325.7 ptxas info : Compile time = 2363.860 ms 2025-09-07T06:29:43.2111414Z #22 325.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:43.2120567Z #22 325.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:43.2125754Z #22 325.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:43.2126697Z #22 325.7 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:43.2127513Z #22 325.7 ptxas info : Compile time = 873.735 ms 2025-09-07T06:29:43.2132972Z #22 325.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:43.2142443Z #22 325.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:43.2147549Z #22 325.7 32 bytes stack frame, 64 bytes spill stores, 72 bytes spill loads 2025-09-07T06:29:43.2148677Z #22 325.7 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:29:43.2150080Z #22 325.7 ptxas info : Compile time = 981.724 ms 2025-09-07T06:29:43.2155210Z #22 325.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:43.2164623Z #22 325.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:43.2169818Z #22 325.7 56 bytes stack frame, 280 bytes spill stores, 336 bytes spill loads 2025-09-07T06:29:43.2171013Z #22 325.7 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:29:43.2172058Z #22 325.7 ptxas info : Compile time = 1697.019 ms 2025-09-07T06:29:46.7039284Z #22 329.4 [19/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_packgqa_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_packgqa_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_packgqa_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:29:46.7059474Z #22 329.4 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:29:46.7065303Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:46.7075620Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7081076Z #22 329.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:46.7082279Z #22 329.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:46.7083388Z #22 329.4 ptxas info : Compile time = 2.045 ms 2025-09-07T06:29:46.7088718Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:46.7099221Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7104860Z #22 329.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:46.7106083Z #22 329.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:46.7107215Z #22 329.4 ptxas info : Compile time = 21.108 ms 2025-09-07T06:29:46.7112916Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:46.7123713Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7129673Z #22 329.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:46.7130913Z #22 329.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:46.7131965Z #22 329.4 ptxas info : Compile time = 1.025 ms 2025-09-07T06:29:46.7137830Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:46.7148492Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7154786Z #22 329.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:46.7156051Z #22 329.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:46.7157127Z #22 329.4 ptxas info : Compile time = 0.673 ms 2025-09-07T06:29:46.7162303Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:46.7172504Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7177966Z #22 329.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:46.7179215Z #22 329.4 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:29:46.7180250Z #22 329.4 ptxas info : Compile time = 0.595 ms 2025-09-07T06:29:46.7185735Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:46.7198325Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7203847Z #22 329.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:46.7205010Z #22 329.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:46.7205918Z #22 329.4 ptxas info : Compile time = 0.568 ms 2025-09-07T06:29:46.7211354Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:46.7221441Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7227104Z #22 329.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:46.7228214Z #22 329.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:46.7229196Z #22 329.4 ptxas info : Compile time = 0.567 ms 2025-09-07T06:29:46.7234441Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:46.7244484Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7249806Z #22 329.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:46.7250920Z #22 329.4 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:29:46.7251867Z #22 329.4 ptxas info : Compile time = 0.576 ms 2025-09-07T06:29:46.7257481Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:46.7267528Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7273051Z #22 329.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:46.7274151Z #22 329.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:46.7275091Z #22 329.4 ptxas info : Compile time = 0.570 ms 2025-09-07T06:29:46.7280565Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:46.7290490Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7296576Z #22 329.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:46.7297614Z #22 329.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:46.7298571Z #22 329.4 ptxas info : Compile time = 0.541 ms 2025-09-07T06:29:46.7299289Z #22 329.4 ptxas info : 10 bytes gmem 2025-09-07T06:29:46.7304489Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:46.7313987Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7319644Z #22 329.4 24 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads 2025-09-07T06:29:46.7320942Z #22 329.4 ptxas info : Used 168 registers, used 16 barriers, 24 bytes cumulative stack size 2025-09-07T06:29:46.7322008Z #22 329.4 ptxas info : Compile time = 854.956 ms 2025-09-07T06:29:46.7327280Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:46.7337277Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7342636Z #22 329.4 40 bytes stack frame, 92 bytes spill stores, 92 bytes spill loads 2025-09-07T06:29:46.7343888Z #22 329.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:29:46.7345013Z #22 329.4 ptxas info : Compile time = 864.510 ms 2025-09-07T06:29:46.7350772Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:46.7361081Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7367176Z #22 329.4 56 bytes stack frame, 184 bytes spill stores, 196 bytes spill loads 2025-09-07T06:29:46.7368477Z #22 329.4 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:29:46.7369596Z #22 329.4 ptxas info : Compile time = 1037.850 ms 2025-09-07T06:29:46.7375494Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:46.7386009Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7492795Z #22 329.4 104 bytes stack frame, 432 bytes spill stores, 548 bytes spill loads 2025-09-07T06:29:46.7494679Z #22 329.4 ptxas info : Used 168 registers, used 16 barriers, 104 bytes cumulative stack size 2025-09-07T06:29:46.7495849Z #22 329.4 ptxas info : Compile time = 1984.059 ms 2025-09-07T06:29:46.7501128Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:46.7510931Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7516262Z #22 329.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:46.7517274Z #22 329.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:46.7518125Z #22 329.4 ptxas info : Compile time = 1418.959 ms 2025-09-07T06:29:46.7523580Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:46.7533849Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7539287Z #22 329.4 40 bytes stack frame, 80 bytes spill stores, 112 bytes spill loads 2025-09-07T06:29:46.7540973Z #22 329.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:29:46.7542050Z #22 329.4 ptxas info : Compile time = 1594.050 ms 2025-09-07T06:29:46.7547501Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:46.7557484Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7562986Z #22 329.4 64 bytes stack frame, 300 bytes spill stores, 340 bytes spill loads 2025-09-07T06:29:46.7564295Z #22 329.4 ptxas info : Used 168 registers, used 16 barriers, 64 bytes cumulative stack size 2025-09-07T06:29:46.7565693Z #22 329.4 ptxas info : Compile time = 2554.111 ms 2025-09-07T06:29:46.7570943Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:46.7580735Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7586140Z #22 329.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:46.7587137Z #22 329.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:46.7587999Z #22 329.4 ptxas info : Compile time = 981.798 ms 2025-09-07T06:29:46.7593772Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:46.7603817Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.7609228Z #22 329.4 40 bytes stack frame, 92 bytes spill stores, 112 bytes spill loads 2025-09-07T06:29:46.7610480Z #22 329.4 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:29:46.8531011Z #22 329.4 ptxas info : Compile time = 1099.950 ms 2025-09-07T06:29:46.8537124Z #22 329.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:46.8547242Z #22 329.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:46.8552831Z #22 329.4 88 bytes stack frame, 260 bytes spill stores, 304 bytes spill loads 2025-09-07T06:29:46.8554117Z #22 329.4 ptxas info : Used 168 registers, used 16 barriers, 88 bytes cumulative stack size 2025-09-07T06:29:46.8555185Z #22 329.4 ptxas info : Compile time = 1891.640 ms 2025-09-07T06:29:48.6816613Z #22 331.4 [20/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:29:48.6836245Z #22 331.4 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:29:48.6841792Z #22 331.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:48.6851244Z #22 331.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:48.6857157Z #22 331.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:48.6858353Z #22 331.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:48.6859364Z #22 331.4 ptxas info : Compile time = 1.893 ms 2025-09-07T06:29:48.6865354Z #22 331.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:48.6875785Z #22 331.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:48.6881447Z #22 331.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:48.6882592Z #22 331.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:48.6883576Z #22 331.4 ptxas info : Compile time = 21.092 ms 2025-09-07T06:29:48.6889290Z #22 331.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:48.6900344Z #22 331.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:48.6906339Z #22 331.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:48.6907497Z #22 331.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:48.6908495Z #22 331.4 ptxas info : Compile time = 0.976 ms 2025-09-07T06:29:48.6913825Z #22 331.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:48.6923219Z #22 331.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:48.6928408Z #22 331.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:48.6929569Z #22 331.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:48.6930570Z #22 331.4 ptxas info : Compile time = 0.672 ms 2025-09-07T06:29:48.6935938Z #22 331.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:48.6946073Z #22 331.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:48.6951622Z #22 331.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:48.6952758Z #22 331.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:48.6953701Z #22 331.4 ptxas info : Compile time = 0.578 ms 2025-09-07T06:29:48.6959183Z #22 331.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:48.6969253Z #22 331.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:48.6975099Z #22 331.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:48.6976268Z #22 331.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:48.6977287Z #22 331.4 ptxas info : Compile time = 0.570 ms 2025-09-07T06:29:48.6982462Z #22 331.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:48.6992393Z #22 331.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:48.6998054Z #22 331.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:48.6999221Z #22 331.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:48.7000229Z #22 331.4 ptxas info : Compile time = 0.649 ms 2025-09-07T06:29:48.7005866Z #22 331.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:48.7016501Z #22 331.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:48.7022255Z #22 331.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:48.7023644Z #22 331.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:48.7024620Z #22 331.4 ptxas info : Compile time = 0.561 ms 2025-09-07T06:29:48.7030152Z #22 331.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:48.7039770Z #22 331.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:48.7044943Z #22 331.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:48.7045978Z #22 331.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:48.7046863Z #22 331.4 ptxas info : Compile time = 0.545 ms 2025-09-07T06:29:48.7047535Z #22 331.4 ptxas info : 10 bytes gmem 2025-09-07T06:29:48.7052726Z #22 331.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:48.7062337Z #22 331.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:48.7067603Z #22 331.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:48.7068905Z #22 331.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:48.7069774Z #22 331.4 ptxas info : Compile time = 855.230 ms 2025-09-07T06:29:48.7075615Z #22 331.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:48.7086004Z #22 331.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:48.7091805Z #22 331.4 64 bytes stack frame, 140 bytes spill stores, 176 bytes spill loads 2025-09-07T06:29:48.7093823Z #22 331.4 ptxas info : Used 168 registers, used 16 barriers, 64 bytes cumulative stack size 2025-09-07T06:29:48.7094945Z #22 331.4 ptxas info : Compile time = 1171.713 ms 2025-09-07T06:29:48.7100819Z #22 331.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:48.7111726Z #22 331.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:48.7117690Z #22 331.4 160 bytes stack frame, 392 bytes spill stores, 560 bytes spill loads 2025-09-07T06:29:48.7119034Z #22 331.4 ptxas info : Used 168 registers, used 16 barriers, 160 bytes cumulative stack size 2025-09-07T06:29:48.7120161Z #22 331.4 ptxas info : Compile time = 2172.712 ms 2025-09-07T06:29:48.7125381Z #22 331.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:48.7135059Z #22 331.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:48.7140328Z #22 331.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:48.7141378Z #22 331.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:48.7142614Z #22 331.4 ptxas info : Compile time = 1343.706 ms 2025-09-07T06:29:48.7148220Z #22 331.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:48.7158564Z #22 331.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:48.7163507Z #22 331.4 104 bytes stack frame, 196 bytes spill stores, 236 bytes spill loads 2025-09-07T06:29:48.7164684Z #22 331.4 ptxas info : Used 168 registers, used 16 barriers, 104 bytes cumulative stack size 2025-09-07T06:29:48.7165756Z #22 331.4 ptxas info : Compile time = 1786.336 ms 2025-09-07T06:29:48.7171565Z #22 331.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:48.7181294Z #22 331.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:48.7186747Z #22 331.4 104 bytes stack frame, 344 bytes spill stores, 404 bytes spill loads 2025-09-07T06:29:48.7188063Z #22 331.4 ptxas info : Used 168 registers, used 16 barriers, 104 bytes cumulative stack size 2025-09-07T06:29:48.7189137Z #22 331.4 ptxas info : Compile time = 2813.512 ms 2025-09-07T06:29:48.8315060Z #22 331.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:48.8324532Z #22 331.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:48.8328067Z #22 331.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:48.8328787Z #22 331.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:48.8329370Z #22 331.4 ptxas info : Compile time = 1222.338 ms 2025-09-07T06:29:48.8333214Z #22 331.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:48.8340591Z #22 331.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:48.8345936Z #22 331.4 56 bytes stack frame, 140 bytes spill stores, 168 bytes spill loads 2025-09-07T06:29:48.8347390Z #22 331.4 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:29:48.8348609Z #22 331.4 ptxas info : Compile time = 1228.047 ms 2025-09-07T06:29:48.8354637Z #22 331.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:48.8365287Z #22 331.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:48.8371127Z #22 331.4 136 bytes stack frame, 384 bytes spill stores, 432 bytes spill loads 2025-09-07T06:29:48.8372632Z #22 331.4 ptxas info : Used 168 registers, used 16 barriers, 136 bytes cumulative stack size 2025-09-07T06:29:48.8373836Z #22 331.4 ptxas info : Compile time = 2152.190 ms 2025-09-07T06:29:52.8338316Z #22 335.5 [21/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:29:52.8359255Z #22 335.5 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:29:52.8365105Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:52.8373130Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8376603Z #22 335.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:52.8377367Z #22 335.5 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:29:52.8378007Z #22 335.5 ptxas info : Compile time = 1.842 ms 2025-09-07T06:29:52.8381471Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:52.8387980Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8391492Z #22 335.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:52.8392553Z #22 335.5 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:29:52.8393233Z #22 335.5 ptxas info : Compile time = 0.870 ms 2025-09-07T06:29:52.8397015Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:52.8406916Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8414123Z #22 335.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:52.8415352Z #22 335.5 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:29:52.8416340Z #22 335.5 ptxas info : Compile time = 1.060 ms 2025-09-07T06:29:52.8422136Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:52.8433700Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8439759Z #22 335.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:52.8440932Z #22 335.5 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:29:52.8441933Z #22 335.5 ptxas info : Compile time = 0.663 ms 2025-09-07T06:29:52.8447526Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:52.8456825Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8460309Z #22 335.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:52.8461062Z #22 335.5 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:29:52.8461699Z #22 335.5 ptxas info : Compile time = 0.576 ms 2025-09-07T06:29:52.8465286Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:52.8471882Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8475746Z #22 335.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:52.8476484Z #22 335.5 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:29:52.8477140Z #22 335.5 ptxas info : Compile time = 0.556 ms 2025-09-07T06:29:52.8480724Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:52.8488897Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8495799Z #22 335.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:52.8497371Z #22 335.5 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:29:52.8498298Z #22 335.5 ptxas info : Compile time = 0.549 ms 2025-09-07T06:29:52.8504131Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:52.8514411Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8519976Z #22 335.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:52.8521147Z #22 335.5 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:29:52.8522108Z #22 335.5 ptxas info : Compile time = 0.524 ms 2025-09-07T06:29:52.8527914Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:52.8538243Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8541820Z #22 335.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:52.8542571Z #22 335.5 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:29:52.8543618Z #22 335.5 ptxas info : Compile time = 0.558 ms 2025-09-07T06:29:52.8547194Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:52.8553698Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8557320Z #22 335.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:52.8558073Z #22 335.5 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:29:52.8558727Z #22 335.5 ptxas info : Compile time = 0.520 ms 2025-09-07T06:29:52.8559202Z #22 335.5 ptxas info : 10 bytes gmem 2025-09-07T06:29:52.8562855Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:52.8570268Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8576289Z #22 335.5 8 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:29:52.8577727Z #22 335.5 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:52.8578964Z #22 335.5 ptxas info : Compile time = 749.049 ms 2025-09-07T06:29:52.8584768Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:52.8595256Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8600939Z #22 335.5 24 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:29:52.8602182Z #22 335.5 ptxas info : Used 168 registers, used 16 barriers, 24 bytes cumulative stack size 2025-09-07T06:29:52.8603346Z #22 335.5 ptxas info : Compile time = 754.264 ms 2025-09-07T06:29:52.8609678Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:52.8621043Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8625710Z #22 335.5 40 bytes stack frame, 116 bytes spill stores, 132 bytes spill loads 2025-09-07T06:29:52.8626576Z #22 335.5 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:29:52.8627309Z #22 335.5 ptxas info : Compile time = 931.937 ms 2025-09-07T06:29:52.8631389Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:52.8638198Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8642040Z #22 335.5 152 bytes stack frame, 348 bytes spill stores, 576 bytes spill loads 2025-09-07T06:29:52.8642912Z #22 335.5 ptxas info : Used 168 registers, used 16 barriers, 152 bytes cumulative stack size 2025-09-07T06:29:52.8643636Z #22 335.5 ptxas info : Compile time = 1925.159 ms 2025-09-07T06:29:52.8647081Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:52.8654190Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8659743Z #22 335.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:52.8660878Z #22 335.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:52.8661824Z #22 335.5 ptxas info : Compile time = 1325.971 ms 2025-09-07T06:29:52.8667838Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:52.8678866Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8684688Z #22 335.5 32 bytes stack frame, 96 bytes spill stores, 104 bytes spill loads 2025-09-07T06:29:52.8685947Z #22 335.5 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:29:52.8687109Z #22 335.5 ptxas info : Compile time = 1490.375 ms 2025-09-07T06:29:52.8714854Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:52.8721558Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8725286Z #22 335.5 56 bytes stack frame, 228 bytes spill stores, 308 bytes spill loads 2025-09-07T06:29:52.8726154Z #22 335.5 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:29:52.8726901Z #22 335.5 ptxas info : Compile time = 2380.120 ms 2025-09-07T06:29:52.8730380Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:52.8736906Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8742041Z #22 335.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:52.8743218Z #22 335.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:52.8744199Z #22 335.5 ptxas info : Compile time = 873.496 ms 2025-09-07T06:29:52.8750647Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:52.8761937Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8767825Z #22 335.5 32 bytes stack frame, 64 bytes spill stores, 72 bytes spill loads 2025-09-07T06:29:52.8769145Z #22 335.5 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:29:52.8770312Z #22 335.5 ptxas info : Compile time = 1053.745 ms 2025-09-07T06:29:52.8776516Z #22 335.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:52.8787230Z #22 335.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:52.8793441Z #22 335.5 56 bytes stack frame, 280 bytes spill stores, 336 bytes spill loads 2025-09-07T06:29:52.8794558Z #22 335.5 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:29:52.9832032Z #22 335.5 ptxas info : Compile time = 1936.515 ms 2025-09-07T06:29:54.8635269Z #22 337.5 [22/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:29:54.8654664Z #22 337.5 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:29:54.8660240Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:54.8669496Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.8674707Z #22 337.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:54.8675779Z #22 337.5 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:29:54.8676697Z #22 337.5 ptxas info : Compile time = 1.868 ms 2025-09-07T06:29:54.8681466Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:54.8690908Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.8696730Z #22 337.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:54.8697921Z #22 337.5 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:29:54.8698933Z #22 337.5 ptxas info : Compile time = 21.084 ms 2025-09-07T06:29:54.8704401Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:54.8714328Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.8719971Z #22 337.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:54.8721095Z #22 337.5 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:29:54.8722121Z #22 337.5 ptxas info : Compile time = 0.800 ms 2025-09-07T06:29:54.8727212Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:54.8737325Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.8743063Z #22 337.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:54.8744563Z #22 337.5 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:29:54.8745614Z #22 337.5 ptxas info : Compile time = 0.787 ms 2025-09-07T06:29:54.8750407Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:54.8760055Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.8765453Z #22 337.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:54.8766676Z #22 337.5 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:29:54.8767716Z #22 337.5 ptxas info : Compile time = 0.559 ms 2025-09-07T06:29:54.8773276Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:54.8783198Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.8788797Z #22 337.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:54.8790319Z #22 337.5 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:29:54.8791405Z #22 337.5 ptxas info : Compile time = 0.539 ms 2025-09-07T06:29:54.8898296Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:54.8907900Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.8913145Z #22 337.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:54.8914224Z #22 337.5 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:29:54.8915138Z #22 337.5 ptxas info : Compile time = 0.578 ms 2025-09-07T06:29:54.8920588Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:54.8929144Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.8934006Z #22 337.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:54.8935043Z #22 337.5 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:29:54.8935937Z #22 337.5 ptxas info : Compile time = 0.598 ms 2025-09-07T06:29:54.8941002Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:54.8950427Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.8955605Z #22 337.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:54.8956654Z #22 337.5 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:29:54.8957580Z #22 337.5 ptxas info : Compile time = 0.531 ms 2025-09-07T06:29:54.8963109Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:54.8972817Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.8978057Z #22 337.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:54.8978995Z #22 337.5 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:29:54.8979725Z #22 337.5 ptxas info : Compile time = 0.487 ms 2025-09-07T06:29:54.8980328Z #22 337.5 ptxas info : 10 bytes gmem 2025-09-07T06:29:54.8985038Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:54.8994513Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.8999494Z #22 337.5 8 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:29:54.9000684Z #22 337.5 ptxas info : Used 168 registers, used 16 barriers, 8 bytes cumulative stack size 2025-09-07T06:29:54.9001717Z #22 337.5 ptxas info : Compile time = 776.217 ms 2025-09-07T06:29:54.9006783Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:54.9016211Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.9021316Z #22 337.5 24 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:29:54.9022491Z #22 337.5 ptxas info : Used 168 registers, used 16 barriers, 24 bytes cumulative stack size 2025-09-07T06:29:54.9023533Z #22 337.5 ptxas info : Compile time = 786.718 ms 2025-09-07T06:29:54.9029019Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:54.9038302Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.9043647Z #22 337.5 40 bytes stack frame, 116 bytes spill stores, 132 bytes spill loads 2025-09-07T06:29:54.9044834Z #22 337.5 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:29:54.9045861Z #22 337.5 ptxas info : Compile time = 956.877 ms 2025-09-07T06:29:54.9051492Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:54.9061610Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.9067104Z #22 337.5 152 bytes stack frame, 348 bytes spill stores, 576 bytes spill loads 2025-09-07T06:29:54.9068318Z #22 337.5 ptxas info : Used 168 registers, used 16 barriers, 152 bytes cumulative stack size 2025-09-07T06:29:54.9069379Z #22 337.5 ptxas info : Compile time = 1808.288 ms 2025-09-07T06:29:54.9074450Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:54.9083489Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.9087997Z #22 337.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:54.9088930Z #22 337.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:54.9089714Z #22 337.5 ptxas info : Compile time = 1248.865 ms 2025-09-07T06:29:54.9094965Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:54.9104065Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.9108976Z #22 337.5 32 bytes stack frame, 96 bytes spill stores, 104 bytes spill loads 2025-09-07T06:29:54.9110154Z #22 337.5 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:29:54.9111138Z #22 337.5 ptxas info : Compile time = 1439.461 ms 2025-09-07T06:29:54.9116309Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:54.9125449Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.9130593Z #22 337.5 56 bytes stack frame, 228 bytes spill stores, 308 bytes spill loads 2025-09-07T06:29:54.9131799Z #22 337.5 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:29:54.9133052Z #22 337.5 ptxas info : Compile time = 2475.347 ms 2025-09-07T06:29:54.9138074Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:54.9147098Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.9152609Z #22 337.5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:54.9153649Z #22 337.5 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:54.9154550Z #22 337.5 ptxas info : Compile time = 867.363 ms 2025-09-07T06:29:54.9160229Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:54.9170361Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.9176286Z #22 337.5 32 bytes stack frame, 64 bytes spill stores, 72 bytes spill loads 2025-09-07T06:29:54.9177610Z #22 337.5 ptxas info : Used 168 registers, used 16 barriers, 32 bytes cumulative stack size 2025-09-07T06:29:54.9178755Z #22 337.5 ptxas info : Compile time = 1050.241 ms 2025-09-07T06:29:54.9184703Z #22 337.5 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:54.9194663Z #22 337.5 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:54.9200448Z #22 337.5 56 bytes stack frame, 280 bytes spill stores, 336 bytes spill loads 2025-09-07T06:29:55.0132222Z #22 337.5 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:29:55.0133427Z #22 337.5 ptxas info : Compile time = 1808.061 ms 2025-09-07T06:29:57.7658544Z #22 340.4 [23/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:29:57.7678851Z #22 340.4 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:29:57.7684609Z #22 340.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:57.7695456Z #22 340.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:57.7701516Z #22 340.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:57.7702714Z #22 340.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:57.7703694Z #22 340.4 ptxas info : Compile time = 1.836 ms 2025-09-07T06:29:57.7709859Z #22 340.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:57.7721126Z #22 340.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:57.7725801Z #22 340.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:57.7726589Z #22 340.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:57.7727270Z #22 340.4 ptxas info : Compile time = 0.893 ms 2025-09-07T06:29:57.7731134Z #22 340.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:57.7738456Z #22 340.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:57.7742825Z #22 340.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:57.7743588Z #22 340.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:57.7744265Z #22 340.4 ptxas info : Compile time = 21.069 ms 2025-09-07T06:29:57.7747720Z #22 340.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:57.7756678Z #22 340.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:57.7762710Z #22 340.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:57.7764102Z #22 340.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:57.7765095Z #22 340.4 ptxas info : Compile time = 0.703 ms 2025-09-07T06:29:57.7770826Z #22 340.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:57.7781754Z #22 340.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:57.7787532Z #22 340.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:57.7788661Z #22 340.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:57.7789678Z #22 340.4 ptxas info : Compile time = 0.592 ms 2025-09-07T06:29:57.7795819Z #22 340.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:57.7806435Z #22 340.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:57.7810792Z #22 340.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:57.7811934Z #22 340.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:57.7812755Z #22 340.4 ptxas info : Compile time = 0.564 ms 2025-09-07T06:29:57.7816211Z #22 340.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:57.7822474Z #22 340.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:57.7825953Z #22 340.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:57.7826720Z #22 340.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:57.7827401Z #22 340.4 ptxas info : Compile time = 0.616 ms 2025-09-07T06:29:57.7831465Z #22 340.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:57.7841028Z #22 340.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:57.7847512Z #22 340.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:57.7848715Z #22 340.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:57.7849685Z #22 340.4 ptxas info : Compile time = 0.576 ms 2025-09-07T06:29:57.7855689Z #22 340.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:29:57.7866288Z #22 340.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:57.7872184Z #22 340.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:57.7873308Z #22 340.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:29:57.7874301Z #22 340.4 ptxas info : Compile time = 0.550 ms 2025-09-07T06:29:57.7875354Z #22 340.4 ptxas info : 10 bytes gmem 2025-09-07T06:29:57.7880994Z #22 340.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:57.7891269Z #22 340.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:57.7996116Z #22 340.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:57.7996874Z #22 340.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:57.7997461Z #22 340.4 ptxas info : Compile time = 827.918 ms 2025-09-07T06:29:57.8001761Z #22 340.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:57.8009025Z #22 340.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:57.8013180Z #22 340.4 64 bytes stack frame, 140 bytes spill stores, 176 bytes spill loads 2025-09-07T06:29:57.8014047Z #22 340.4 ptxas info : Used 168 registers, used 16 barriers, 64 bytes cumulative stack size 2025-09-07T06:29:57.8014814Z #22 340.4 ptxas info : Compile time = 1174.650 ms 2025-09-07T06:29:57.8019607Z #22 340.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:57.8032240Z #22 340.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:57.8038475Z #22 340.4 160 bytes stack frame, 392 bytes spill stores, 560 bytes spill loads 2025-09-07T06:29:57.8039703Z #22 340.4 ptxas info : Used 168 registers, used 16 barriers, 160 bytes cumulative stack size 2025-09-07T06:29:57.8041296Z #22 340.4 ptxas info : Compile time = 2064.233 ms 2025-09-07T06:29:57.8046627Z #22 340.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:57.8056662Z #22 340.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:57.8062198Z #22 340.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:57.8063261Z #22 340.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:57.8064179Z #22 340.4 ptxas info : Compile time = 1281.344 ms 2025-09-07T06:29:57.8070399Z #22 340.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:57.8080384Z #22 340.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:57.8084207Z #22 340.4 104 bytes stack frame, 196 bytes spill stores, 236 bytes spill loads 2025-09-07T06:29:57.8085118Z #22 340.4 ptxas info : Used 168 registers, used 16 barriers, 104 bytes cumulative stack size 2025-09-07T06:29:57.8085872Z #22 340.4 ptxas info : Compile time = 1725.642 ms 2025-09-07T06:29:57.8089577Z #22 340.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:57.8096912Z #22 340.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:57.8100682Z #22 340.4 104 bytes stack frame, 344 bytes spill stores, 404 bytes spill loads 2025-09-07T06:29:57.8101566Z #22 340.4 ptxas info : Used 168 registers, used 16 barriers, 104 bytes cumulative stack size 2025-09-07T06:29:57.8102325Z #22 340.4 ptxas info : Compile time = 2721.478 ms 2025-09-07T06:29:57.8106060Z #22 340.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:57.9164151Z #22 340.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:57.9169931Z #22 340.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:29:57.9171033Z #22 340.4 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:29:57.9171965Z #22 340.4 ptxas info : Compile time = 877.903 ms 2025-09-07T06:29:57.9178340Z #22 340.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:57.9185031Z #22 340.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:57.9188709Z #22 340.4 56 bytes stack frame, 140 bytes spill stores, 168 bytes spill loads 2025-09-07T06:29:57.9189564Z #22 340.4 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:29:57.9190354Z #22 340.4 ptxas info : Compile time = 1249.700 ms 2025-09-07T06:29:57.9194346Z #22 340.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:29:57.9201870Z #22 340.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:29:57.9208017Z #22 340.4 136 bytes stack frame, 384 bytes spill stores, 432 bytes spill loads 2025-09-07T06:29:57.9209522Z #22 340.4 ptxas info : Used 168 registers, used 16 barriers, 136 bytes cumulative stack size 2025-09-07T06:29:57.9210737Z #22 340.4 ptxas info : Compile time = 2209.972 ms 2025-09-07T06:30:03.6170538Z #22 346.3 [24/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_packgqa_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_packgqa_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_packgqa_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:30:03.6193625Z #22 346.3 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:30:03.6199512Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:03.6210102Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6216033Z #22 346.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:03.6216938Z #22 346.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:30:03.6217624Z #22 346.3 ptxas info : Compile time = 1.884 ms 2025-09-07T06:30:03.6221142Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:03.6227551Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6231504Z #22 346.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:03.6232244Z #22 346.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:30:03.6232896Z #22 346.3 ptxas info : Compile time = 0.824 ms 2025-09-07T06:30:03.6236670Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:03.6243664Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6249824Z #22 346.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:03.6250637Z #22 346.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:30:03.6251293Z #22 346.3 ptxas info : Compile time = 20.836 ms 2025-09-07T06:30:03.6255306Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:03.6263857Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6270639Z #22 346.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:03.6271947Z #22 346.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:30:03.6273084Z #22 346.3 ptxas info : Compile time = 0.976 ms 2025-09-07T06:30:03.6279444Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:03.6290101Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6296677Z #22 346.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:03.6297792Z #22 346.3 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:30:03.6298826Z #22 346.3 ptxas info : Compile time = 0.686 ms 2025-09-07T06:30:03.6304742Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:03.6315047Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6318760Z #22 346.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:03.6319973Z #22 346.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:30:03.6320647Z #22 346.3 ptxas info : Compile time = 0.602 ms 2025-09-07T06:30:03.6324237Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:03.6330841Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6334670Z #22 346.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:03.6335430Z #22 346.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:30:03.6336124Z #22 346.3 ptxas info : Compile time = 0.559 ms 2025-09-07T06:30:03.6339614Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:03.6345960Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6349445Z #22 346.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:03.6350191Z #22 346.3 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:30:03.6351240Z #22 346.3 ptxas info : Compile time = 0.577 ms 2025-09-07T06:30:03.6354844Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:03.6364378Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6371208Z #22 346.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:03.6372755Z #22 346.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:30:03.6373775Z #22 346.3 ptxas info : Compile time = 0.561 ms 2025-09-07T06:30:03.6380630Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:03.6391582Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6499521Z #22 346.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:03.6500721Z #22 346.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:30:03.6519196Z #22 346.3 ptxas info : Compile time = 0.541 ms 2025-09-07T06:30:03.6520225Z #22 346.3 ptxas info : 10 bytes gmem 2025-09-07T06:30:03.6525865Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:03.6534502Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6537987Z #22 346.3 24 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads 2025-09-07T06:30:03.6538825Z #22 346.3 ptxas info : Used 168 registers, used 16 barriers, 24 bytes cumulative stack size 2025-09-07T06:30:03.6540027Z #22 346.3 ptxas info : Compile time = 823.031 ms 2025-09-07T06:30:03.6543565Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:03.6549995Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESB_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SC_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6553609Z #22 346.3 40 bytes stack frame, 92 bytes spill stores, 92 bytes spill loads 2025-09-07T06:30:03.6554466Z #22 346.3 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:30:03.6555187Z #22 346.3 ptxas info : Compile time = 825.973 ms 2025-09-07T06:30:03.6559306Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:03.6566183Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6570009Z #22 346.3 56 bytes stack frame, 184 bytes spill stores, 196 bytes spill loads 2025-09-07T06:30:03.6570850Z #22 346.3 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:30:03.6571588Z #22 346.3 ptxas info : Compile time = 982.753 ms 2025-09-07T06:30:03.6576684Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:03.6588866Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi176EEESA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6595903Z #22 346.3 104 bytes stack frame, 432 bytes spill stores, 548 bytes spill loads 2025-09-07T06:30:03.6597381Z #22 346.3 ptxas info : Used 168 registers, used 16 barriers, 104 bytes cumulative stack size 2025-09-07T06:30:03.6599011Z #22 346.3 ptxas info : Compile time = 1843.671 ms 2025-09-07T06:30:03.6604501Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:03.6615106Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6620915Z #22 346.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:03.6621988Z #22 346.3 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:30:03.6622823Z #22 346.3 ptxas info : Compile time = 1361.689 ms 2025-09-07T06:30:03.6629131Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:03.6635854Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6639495Z #22 346.3 40 bytes stack frame, 80 bytes spill stores, 112 bytes spill loads 2025-09-07T06:30:03.6640366Z #22 346.3 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:30:03.6641093Z #22 346.3 ptxas info : Compile time = 1541.073 ms 2025-09-07T06:30:03.6644669Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:03.6651354Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6655243Z #22 346.3 64 bytes stack frame, 300 bytes spill stores, 340 bytes spill loads 2025-09-07T06:30:03.6656089Z #22 346.3 ptxas info : Used 168 registers, used 16 barriers, 64 bytes cumulative stack size 2025-09-07T06:30:03.6656832Z #22 346.3 ptxas info : Compile time = 2461.703 ms 2025-09-07T06:30:03.6660328Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:03.6667000Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6670492Z #22 346.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:03.6671436Z #22 346.3 ptxas info : Used 168 registers, used 16 barriers 2025-09-07T06:30:03.6672341Z #22 346.3 ptxas info : Compile time = 950.274 ms 2025-09-07T06:30:03.6678310Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:03.6690565Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.6696165Z #22 346.3 40 bytes stack frame, 92 bytes spill stores, 112 bytes spill loads 2025-09-07T06:30:03.6697293Z #22 346.3 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:30:03.6698267Z #22 346.3 ptxas info : Compile time = 1087.619 ms 2025-09-07T06:30:03.6703211Z #22 346.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:03.6711989Z #22 346.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_SA_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdISB_S9_SC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:03.7664330Z #22 346.3 88 bytes stack frame, 260 bytes spill stores, 304 bytes spill loads 2025-09-07T06:30:03.7665571Z #22 346.3 ptxas info : Used 168 registers, used 16 barriers, 88 bytes cumulative stack size 2025-09-07T06:30:03.7666610Z #22 346.3 ptxas info : Compile time = 1908.014 ms 2025-09-07T06:30:34.3143199Z #22 377.0 [25/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_split_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_split_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_split_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:30:34.3161392Z #22 377.0 ptxas info : 10 bytes gmem 2025-09-07T06:30:34.3165872Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.3174266Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.3178887Z #22 377.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.3179827Z #22 377.0 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:34.3180581Z #22 377.0 ptxas info : Compile time = 2.139 ms 2025-09-07T06:30:34.3185233Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.3193908Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.3199073Z #22 377.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.3200008Z #22 377.0 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:34.3200743Z #22 377.0 ptxas info : Compile time = 21.285 ms 2025-09-07T06:30:34.3205495Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.3214239Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.3218906Z #22 377.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.3219785Z #22 377.0 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:34.3220987Z #22 377.0 ptxas info : Compile time = 1.197 ms 2025-09-07T06:30:34.3225619Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.3234140Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4623417Z #22 377.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4624249Z #22 377.0 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:34.4624984Z #22 377.0 ptxas info : Compile time = 0.747 ms 2025-09-07T06:30:34.4630016Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4639399Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4644686Z #22 377.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4645723Z #22 377.0 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:34.4646475Z #22 377.0 ptxas info : Compile time = 0.668 ms 2025-09-07T06:30:34.4652171Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4661935Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4666857Z #22 377.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4667849Z #22 377.0 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:34.4668659Z #22 377.0 ptxas info : Compile time = 0.651 ms 2025-09-07T06:30:34.4674076Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4682411Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4687233Z #22 377.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4688147Z #22 377.0 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:34.4688900Z #22 377.0 ptxas info : Compile time = 0.656 ms 2025-09-07T06:30:34.4694595Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4703848Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4708758Z #22 377.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4709702Z #22 377.0 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:34.4710478Z #22 377.0 ptxas info : Compile time = 0.655 ms 2025-09-07T06:30:34.4715332Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4725158Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4730113Z #22 377.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4731012Z #22 377.0 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:34.4731776Z #22 377.0 ptxas info : Compile time = 0.675 ms 2025-09-07T06:30:34.4737298Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4746001Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4750860Z #22 377.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4751796Z #22 377.0 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:34.4752554Z #22 377.0 ptxas info : Compile time = 0.632 ms 2025-09-07T06:30:34.4757580Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4766878Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4772089Z #22 377.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4773229Z #22 377.0 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:34.4774030Z #22 377.0 ptxas info : Compile time = 0.606 ms 2025-09-07T06:30:34.4779120Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:34.4788892Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4794264Z #22 377.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4795248Z #22 377.0 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:34.4796049Z #22 377.0 ptxas info : Compile time = 0.654 ms 2025-09-07T06:30:34.4796864Z #22 377.0 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:30:34.4801654Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4809402Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4812886Z #22 377.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4813681Z #22 377.0 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:30:34.4814343Z #22 377.0 ptxas info : Compile time = 923.594 ms 2025-09-07T06:30:34.4818465Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4825623Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4829649Z #22 377.0 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:34.4830428Z #22 377.0 ptxas info : Used 248 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:30:34.4831199Z #22 377.0 ptxas info : Compile time = 1646.172 ms 2025-09-07T06:30:34.4835646Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4843844Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4849072Z #22 377.0 96 bytes stack frame, 124 bytes spill stores, 204 bytes spill loads 2025-09-07T06:30:34.4850340Z #22 377.0 ptxas info : Used 255 registers, used 6 barriers, 96 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:30:34.4851521Z #22 377.0 ptxas info : Compile time = 2281.767 ms 2025-09-07T06:30:34.4856720Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4866207Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4871303Z #22 377.0 144 bytes stack frame, 156 bytes spill stores, 180 bytes spill loads 2025-09-07T06:30:34.4872711Z #22 377.0 ptxas info : Used 255 registers, used 2 barriers, 144 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:30:34.4873961Z #22 377.0 ptxas info : Compile time = 1261.741 ms 2025-09-07T06:30:34.4879239Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4888515Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4957451Z #22 377.0 56 bytes stack frame, 72 bytes spill stores, 84 bytes spill loads 2025-09-07T06:30:34.4958737Z #22 377.0 ptxas info : Used 255 registers, used 6 barriers, 56 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:30:34.4959876Z #22 377.0 ptxas info : Compile time = 1800.884 ms 2025-09-07T06:30:34.4964772Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.4973929Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.4979746Z #22 377.0 48 bytes stack frame, 60 bytes spill stores, 72 bytes spill loads 2025-09-07T06:30:34.4981140Z #22 377.0 ptxas info : Used 255 registers, used 6 barriers, 48 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:30:34.4982340Z #22 377.0 ptxas info : Compile time = 3058.331 ms 2025-09-07T06:30:34.4986820Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.5000953Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.5005555Z #22 377.0 112 bytes stack frame, 160 bytes spill stores, 228 bytes spill loads 2025-09-07T06:30:34.5006559Z #22 377.0 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:30:34.5007391Z #22 377.0 ptxas info : Compile time = 2678.462 ms 2025-09-07T06:30:34.5011070Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.5019175Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.5023833Z #22 377.0 112 bytes stack frame, 236 bytes spill stores, 332 bytes spill loads 2025-09-07T06:30:34.5024912Z #22 377.0 ptxas info : Used 255 registers, used 6 barriers, 112 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:30:34.5025791Z #22 377.0 ptxas info : Compile time = 3973.003 ms 2025-09-07T06:30:34.5030115Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.5039012Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.5044552Z #22 377.0 112 bytes stack frame, 240 bytes spill stores, 388 bytes spill loads 2025-09-07T06:30:34.5045863Z #22 377.0 ptxas info : Used 255 registers, used 6 barriers, 112 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:30:34.5046984Z #22 377.0 ptxas info : Compile time = 4899.695 ms 2025-09-07T06:30:34.5051752Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.5060971Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.5066188Z #22 377.0 144 bytes stack frame, 232 bytes spill stores, 448 bytes spill loads 2025-09-07T06:30:34.5067389Z #22 377.0 ptxas info : Used 255 registers, used 6 barriers, 144 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:30:34.5068535Z #22 377.0 ptxas info : Compile time = 2157.508 ms 2025-09-07T06:30:34.5073739Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.5083298Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.5088561Z #22 377.0 144 bytes stack frame, 264 bytes spill stores, 456 bytes spill loads 2025-09-07T06:30:34.5089963Z #22 377.0 ptxas info : Used 255 registers, used 6 barriers, 144 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:30:34.5091137Z #22 377.0 ptxas info : Compile time = 1620.978 ms 2025-09-07T06:30:34.5096395Z #22 377.0 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:34.5106929Z #22 377.0 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:34.5112773Z #22 377.0 136 bytes stack frame, 248 bytes spill stores, 324 bytes spill loads 2025-09-07T06:30:34.5114312Z #22 377.0 ptxas info : Used 255 registers, used 6 barriers, 136 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:30:34.5115591Z #22 377.0 ptxas info : Compile time = 2634.584 ms 2025-09-07T06:30:50.0842498Z #22 392.8 [26/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:30:50.0860470Z #22 392.8 ptxas info : 10 bytes gmem 2025-09-07T06:30:50.0865510Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:50.0874903Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.0879901Z #22 392.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:50.0880716Z #22 392.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:50.0881433Z #22 392.8 ptxas info : Compile time = 1.955 ms 2025-09-07T06:30:50.0885818Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:50.0894921Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.0899494Z #22 392.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:50.0900398Z #22 392.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:50.0901165Z #22 392.8 ptxas info : Compile time = 1.328 ms 2025-09-07T06:30:50.0906143Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:50.0914530Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.0919697Z #22 392.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:50.0920662Z #22 392.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:50.0921496Z #22 392.8 ptxas info : Compile time = 1.181 ms 2025-09-07T06:30:50.0926556Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:50.0935162Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.0939926Z #22 392.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:50.0940818Z #22 392.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:50.0941536Z #22 392.8 ptxas info : Compile time = 0.776 ms 2025-09-07T06:30:50.0946481Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:50.0955793Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.0961212Z #22 392.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:50.0962071Z #22 392.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:50.0962834Z #22 392.8 ptxas info : Compile time = 0.759 ms 2025-09-07T06:30:50.0967993Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:50.0978298Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.0983785Z #22 392.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:50.0984795Z #22 392.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:50.0985598Z #22 392.8 ptxas info : Compile time = 0.696 ms 2025-09-07T06:30:50.0990463Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:50.0999311Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.1004185Z #22 392.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:50.1005168Z #22 392.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:50.1005987Z #22 392.8 ptxas info : Compile time = 0.687 ms 2025-09-07T06:30:50.1011573Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:50.1021273Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.1026712Z #22 392.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:50.1027636Z #22 392.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:50.1028392Z #22 392.8 ptxas info : Compile time = 0.714 ms 2025-09-07T06:30:50.1033624Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:50.1043926Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.1049808Z #22 392.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:50.1050612Z #22 392.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:50.1051381Z #22 392.8 ptxas info : Compile time = 0.712 ms 2025-09-07T06:30:50.1057000Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:50.1067224Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.1072808Z #22 392.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:50.1073832Z #22 392.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:50.1074694Z #22 392.8 ptxas info : Compile time = 0.711 ms 2025-09-07T06:30:50.1080648Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:50.1091161Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.1097682Z #22 392.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:50.1098700Z #22 392.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:50.1099531Z #22 392.8 ptxas info : Compile time = 0.689 ms 2025-09-07T06:30:50.1105427Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:50.1116274Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.1122090Z #22 392.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:50.1123138Z #22 392.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:50.1124022Z #22 392.8 ptxas info : Compile time = 0.670 ms 2025-09-07T06:30:50.1124859Z #22 392.8 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:30:50.1129892Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:50.1139380Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.1144499Z #22 392.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:50.1145688Z #22 392.8 ptxas info : Used 252 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:30:50.1146738Z #22 392.8 ptxas info : Compile time = 1079.497 ms 2025-09-07T06:30:50.1152029Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:50.1161712Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.1167328Z #22 392.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:50.1168514Z #22 392.8 ptxas info : Used 254 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:30:50.1169542Z #22 392.8 ptxas info : Compile time = 2160.304 ms 2025-09-07T06:30:50.1175139Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:50.1185128Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.1190806Z #22 392.8 216 bytes stack frame, 232 bytes spill stores, 332 bytes spill loads 2025-09-07T06:30:50.1215275Z #22 392.8 ptxas info : Used 255 registers, used 6 barriers, 216 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:30:50.1216695Z #22 392.8 ptxas info : Compile time = 2532.132 ms 2025-09-07T06:30:50.1221987Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:50.1230457Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.1235003Z #22 392.8 40 bytes stack frame, 52 bytes spill stores, 68 bytes spill loads 2025-09-07T06:30:50.1236280Z #22 392.8 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:30:50.1237421Z #22 392.8 ptxas info : Compile time = 1403.121 ms 2025-09-07T06:30:50.1242406Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:50.1252195Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.1258482Z #22 392.8 176 bytes stack frame, 192 bytes spill stores, 332 bytes spill loads 2025-09-07T06:30:50.1259735Z #22 392.8 ptxas info : Used 255 registers, used 6 barriers, 176 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:30:50.1260851Z #22 392.8 ptxas info : Compile time = 1643.228 ms 2025-09-07T06:30:50.1265992Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:50.1275519Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.1280886Z #22 392.8 248 bytes stack frame, 340 bytes spill stores, 544 bytes spill loads 2025-09-07T06:30:50.1282495Z #22 392.8 ptxas info : Used 255 registers, used 6 barriers, 248 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:30:50.1283650Z #22 392.8 ptxas info : Compile time = 3121.857 ms 2025-09-07T06:30:50.1288297Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:50.1297172Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.1302093Z #22 392.8 112 bytes stack frame, 156 bytes spill stores, 196 bytes spill loads 2025-09-07T06:30:50.1303530Z #22 392.8 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:30:50.1304721Z #22 392.8 ptxas info : Compile time = 2582.564 ms 2025-09-07T06:30:50.1310043Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:50.1319606Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.1324637Z #22 392.8 48 bytes stack frame, 64 bytes spill stores, 76 bytes spill loads 2025-09-07T06:30:50.1325887Z #22 392.8 ptxas info : Used 255 registers, used 6 barriers, 48 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:30:50.1326971Z #22 392.8 ptxas info : Compile time = 3193.047 ms 2025-09-07T06:30:50.1331869Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:50.1341083Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.1346431Z #22 392.8 112 bytes stack frame, 204 bytes spill stores, 296 bytes spill loads 2025-09-07T06:30:50.1348107Z #22 392.8 ptxas info : Used 255 registers, used 6 barriers, 112 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:30:50.1349187Z #22 392.8 ptxas info : Compile time = 5305.959 ms 2025-09-07T06:30:50.1354091Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:50.1363481Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.1368604Z #22 392.8 272 bytes stack frame, 572 bytes spill stores, 700 bytes spill loads 2025-09-07T06:30:50.1370011Z #22 392.8 ptxas info : Used 255 registers, used 6 barriers, 272 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:30:50.1371255Z #22 392.8 ptxas info : Compile time = 2654.332 ms 2025-09-07T06:30:50.1376802Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:50.2337357Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.2343182Z #22 392.8 256 bytes stack frame, 376 bytes spill stores, 652 bytes spill loads 2025-09-07T06:30:50.2344577Z #22 392.8 ptxas info : Used 255 registers, used 6 barriers, 256 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:30:50.2345714Z #22 392.8 ptxas info : Compile time = 2993.727 ms 2025-09-07T06:30:50.2350957Z #22 392.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:50.2360240Z #22 392.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:50.2365606Z #22 392.8 288 bytes stack frame, 428 bytes spill stores, 764 bytes spill loads 2025-09-07T06:30:50.2366890Z #22 392.8 ptxas info : Used 255 registers, used 6 barriers, 288 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:30:50.2368018Z #22 392.8 ptxas info : Compile time = 5197.005 ms 2025-09-07T06:30:53.5998240Z #22 396.3 [27/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:30:53.6016748Z #22 396.3 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:30:53.6022367Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:53.6032206Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:53.6037470Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:53.6038627Z #22 396.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:30:53.6039651Z #22 396.3 ptxas info : Compile time = 1.764 ms 2025-09-07T06:30:53.6045356Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:53.6055627Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:53.6061164Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:53.6062293Z #22 396.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:30:53.6063326Z #22 396.3 ptxas info : Compile time = 0.810 ms 2025-09-07T06:30:53.6068761Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:53.6078694Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:53.6084301Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:53.6085454Z #22 396.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:30:53.6086456Z #22 396.3 ptxas info : Compile time = 0.810 ms 2025-09-07T06:30:53.6091650Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:53.6173181Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:53.6178329Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:53.6179373Z #22 396.3 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:30:53.6180298Z #22 396.3 ptxas info : Compile time = 0.562 ms 2025-09-07T06:30:53.6186115Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:53.6196126Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:53.6201540Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:53.6202572Z #22 396.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:30:53.6203504Z #22 396.3 ptxas info : Compile time = 0.495 ms 2025-09-07T06:30:53.6208923Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:53.6218929Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:53.6224284Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:53.6225290Z #22 396.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:30:53.6226197Z #22 396.3 ptxas info : Compile time = 0.475 ms 2025-09-07T06:30:53.6231468Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:53.6241474Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:53.6246627Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:53.6247646Z #22 396.3 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:30:53.6248539Z #22 396.3 ptxas info : Compile time = 0.472 ms 2025-09-07T06:30:53.6254377Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:53.6264201Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:53.6269587Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:53.6270603Z #22 396.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:30:53.6271491Z #22 396.3 ptxas info : Compile time = 0.466 ms 2025-09-07T06:30:53.6276862Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:53.6286785Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:53.6292569Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:53.6293568Z #22 396.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:30:53.6294403Z #22 396.3 ptxas info : Compile time = 0.503 ms 2025-09-07T06:30:53.6295042Z #22 396.3 ptxas info : 10 bytes gmem 2025-09-07T06:30:53.6299990Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:53.6309517Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:53.6314559Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:53.6315503Z #22 396.3 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:30:53.6316269Z #22 396.3 ptxas info : Compile time = 510.320 ms 2025-09-07T06:30:53.6321925Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:53.6331653Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:53.6337202Z #22 396.3 16 bytes stack frame, 52 bytes spill stores, 44 bytes spill loads 2025-09-07T06:30:53.6338385Z #22 396.3 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:30:53.6339363Z #22 396.3 ptxas info : Compile time = 645.730 ms 2025-09-07T06:30:53.6344728Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:53.6354704Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:53.6359923Z #22 396.3 40 bytes stack frame, 68 bytes spill stores, 112 bytes spill loads 2025-09-07T06:30:53.6361122Z #22 396.3 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:30:53.6362116Z #22 396.3 ptxas info : Compile time = 1569.514 ms 2025-09-07T06:30:53.6367241Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:53.6377420Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:53.6382533Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:53.6383415Z #22 396.3 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:30:53.6384187Z #22 396.3 ptxas info : Compile time = 1153.062 ms 2025-09-07T06:30:53.6389844Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:53.6399986Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:53.6405237Z #22 396.3 16 bytes stack frame, 44 bytes spill stores, 36 bytes spill loads 2025-09-07T06:30:53.6406386Z #22 396.3 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:30:53.6407341Z #22 396.3 ptxas info : Compile time = 1312.713 ms 2025-09-07T06:30:53.6412842Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:53.6422603Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:53.6428092Z #22 396.3 40 bytes stack frame, 76 bytes spill stores, 128 bytes spill loads 2025-09-07T06:30:53.6429240Z #22 396.3 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:30:53.6430221Z #22 396.3 ptxas info : Compile time = 2704.458 ms 2025-09-07T06:30:53.6435824Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:53.6445454Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:53.6450737Z #22 396.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:53.6451670Z #22 396.3 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:30:53.6452635Z #22 396.3 ptxas info : Compile time = 916.287 ms 2025-09-07T06:30:53.6458365Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:53.6468058Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:53.6473489Z #22 396.3 16 bytes stack frame, 40 bytes spill stores, 28 bytes spill loads 2025-09-07T06:30:53.6474626Z #22 396.3 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:30:53.6475572Z #22 396.3 ptxas info : Compile time = 970.172 ms 2025-09-07T06:30:53.6480686Z #22 396.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:53.6490456Z #22 396.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:53.6495704Z #22 396.3 40 bytes stack frame, 72 bytes spill stores, 112 bytes spill loads 2025-09-07T06:30:53.6496803Z #22 396.3 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:30:53.6500312Z #22 396.3 ptxas info : Compile time = 2168.226 ms 2025-09-07T06:30:54.4502492Z #22 397.1 [28/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_split_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_split_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_split_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:30:54.4520539Z #22 397.1 ptxas info : 10 bytes gmem 2025-09-07T06:30:54.4524976Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:54.4532747Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4537254Z #22 397.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:54.4538149Z #22 397.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:54.4538929Z #22 397.1 ptxas info : Compile time = 2.177 ms 2025-09-07T06:30:54.4543585Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:54.4552017Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4557221Z #22 397.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:54.4558200Z #22 397.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:54.4558961Z #22 397.1 ptxas info : Compile time = 31.325 ms 2025-09-07T06:30:54.4563418Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:54.4571461Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4576206Z #22 397.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:54.4577095Z #22 397.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:54.4577855Z #22 397.1 ptxas info : Compile time = 1.139 ms 2025-09-07T06:30:54.4582472Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:54.4591053Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4596944Z #22 397.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:54.4597773Z #22 397.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:54.4598543Z #22 397.1 ptxas info : Compile time = 0.724 ms 2025-09-07T06:30:54.4603661Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:54.4613067Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4618443Z #22 397.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:54.4619295Z #22 397.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:54.4620056Z #22 397.1 ptxas info : Compile time = 0.701 ms 2025-09-07T06:30:54.4625069Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:54.4634247Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4639359Z #22 397.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:54.4640643Z #22 397.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:54.4641432Z #22 397.1 ptxas info : Compile time = 0.615 ms 2025-09-07T06:30:54.4646069Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:54.4654793Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4659388Z #22 397.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:54.4660304Z #22 397.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:54.4661010Z #22 397.1 ptxas info : Compile time = 0.631 ms 2025-09-07T06:30:54.4665906Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:54.4675094Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4680083Z #22 397.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:54.4681002Z #22 397.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:54.4682056Z #22 397.1 ptxas info : Compile time = 0.608 ms 2025-09-07T06:30:54.4687023Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:54.4696541Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4701733Z #22 397.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:54.4702672Z #22 397.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:54.4703425Z #22 397.1 ptxas info : Compile time = 0.674 ms 2025-09-07T06:30:54.4708734Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:54.4717616Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4722606Z #22 397.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:54.4723543Z #22 397.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:54.4724328Z #22 397.1 ptxas info : Compile time = 0.634 ms 2025-09-07T06:30:54.4729551Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:54.4738878Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4743587Z #22 397.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:54.4744522Z #22 397.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:54.4745269Z #22 397.1 ptxas info : Compile time = 0.607 ms 2025-09-07T06:30:54.4750483Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:54.4760031Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4765241Z #22 397.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:54.4766126Z #22 397.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:54.4766910Z #22 397.1 ptxas info : Compile time = 0.600 ms 2025-09-07T06:30:54.4767646Z #22 397.1 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:30:54.4772623Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:54.4781035Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4785649Z #22 397.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:54.4786674Z #22 397.1 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:30:54.4787614Z #22 397.1 ptxas info : Compile time = 909.396 ms 2025-09-07T06:30:54.4792657Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:54.4801277Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4806054Z #22 397.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:54.4807086Z #22 397.1 ptxas info : Used 248 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:30:54.4808005Z #22 397.1 ptxas info : Compile time = 1753.068 ms 2025-09-07T06:30:54.4813036Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:54.4822087Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4826937Z #22 397.1 96 bytes stack frame, 124 bytes spill stores, 204 bytes spill loads 2025-09-07T06:30:54.4828284Z #22 397.1 ptxas info : Used 255 registers, used 6 barriers, 96 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:30:54.4829419Z #22 397.1 ptxas info : Compile time = 1902.228 ms 2025-09-07T06:30:54.4834451Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:54.4843017Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4847910Z #22 397.1 144 bytes stack frame, 156 bytes spill stores, 180 bytes spill loads 2025-09-07T06:30:54.4849233Z #22 397.1 ptxas info : Used 255 registers, used 2 barriers, 144 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:30:54.4850385Z #22 397.1 ptxas info : Compile time = 1119.016 ms 2025-09-07T06:30:54.4855768Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:54.4865166Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4870249Z #22 397.1 56 bytes stack frame, 72 bytes spill stores, 84 bytes spill loads 2025-09-07T06:30:54.4871502Z #22 397.1 ptxas info : Used 255 registers, used 6 barriers, 56 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:30:54.4872708Z #22 397.1 ptxas info : Compile time = 1415.596 ms 2025-09-07T06:30:54.4877834Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:54.4887558Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4931244Z #22 397.1 48 bytes stack frame, 60 bytes spill stores, 72 bytes spill loads 2025-09-07T06:30:54.4932736Z #22 397.1 ptxas info : Used 255 registers, used 6 barriers, 48 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:30:54.4933870Z #22 397.1 ptxas info : Compile time = 2500.009 ms 2025-09-07T06:30:54.4938925Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:54.4947595Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4952357Z #22 397.1 112 bytes stack frame, 160 bytes spill stores, 228 bytes spill loads 2025-09-07T06:30:54.4953709Z #22 397.1 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:30:54.4954889Z #22 397.1 ptxas info : Compile time = 2233.017 ms 2025-09-07T06:30:54.4959963Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:54.4969116Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4974484Z #22 397.1 112 bytes stack frame, 236 bytes spill stores, 332 bytes spill loads 2025-09-07T06:30:54.4975860Z #22 397.1 ptxas info : Used 255 registers, used 6 barriers, 112 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:30:54.4977032Z #22 397.1 ptxas info : Compile time = 2595.566 ms 2025-09-07T06:30:54.4982033Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:54.4991672Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.4997653Z #22 397.1 112 bytes stack frame, 240 bytes spill stores, 388 bytes spill loads 2025-09-07T06:30:54.4998960Z #22 397.1 ptxas info : Used 255 registers, used 6 barriers, 112 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:30:54.5000112Z #22 397.1 ptxas info : Compile time = 4665.470 ms 2025-09-07T06:30:54.5005295Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:54.5014432Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.5019348Z #22 397.1 144 bytes stack frame, 232 bytes spill stores, 448 bytes spill loads 2025-09-07T06:30:54.5020681Z #22 397.1 ptxas info : Used 255 registers, used 6 barriers, 144 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:30:54.5021805Z #22 397.1 ptxas info : Compile time = 2298.780 ms 2025-09-07T06:30:54.5026729Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:54.5035966Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.5041182Z #22 397.1 144 bytes stack frame, 264 bytes spill stores, 456 bytes spill loads 2025-09-07T06:30:54.5042503Z #22 397.1 ptxas info : Used 255 registers, used 6 barriers, 144 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:30:54.5043651Z #22 397.1 ptxas info : Compile time = 2612.367 ms 2025-09-07T06:30:54.6001189Z #22 397.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:54.6011060Z #22 397.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:54.6016359Z #22 397.1 136 bytes stack frame, 248 bytes spill stores, 324 bytes spill loads 2025-09-07T06:30:54.6017681Z #22 397.1 ptxas info : Used 255 registers, used 6 barriers, 136 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:30:54.6018855Z #22 397.1 ptxas info : Compile time = 4737.574 ms 2025-09-07T06:30:57.6626841Z #22 400.3 [29/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_bf16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_bf16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:30:57.6648978Z #22 400.3 ptxas info : 10 bytes gmem 2025-09-07T06:30:57.6654725Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.6664810Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.6670803Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.6671918Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.6672837Z #22 400.3 ptxas info : Compile time = 2.145 ms 2025-09-07T06:30:57.6678411Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.6688612Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.6694637Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.6695768Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.6698750Z #22 400.3 ptxas info : Compile time = 1.049 ms 2025-09-07T06:30:57.6704350Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.6714456Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.6720093Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.6721218Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.6722171Z #22 400.3 ptxas info : Compile time = 0.695 ms 2025-09-07T06:30:57.6727833Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.6738218Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.6743787Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.6744914Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.6745846Z #22 400.3 ptxas info : Compile time = 20.849 ms 2025-09-07T06:30:57.6751507Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.6762089Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.6767705Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.6768815Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.6769770Z #22 400.3 ptxas info : Compile time = 0.852 ms 2025-09-07T06:30:57.6775551Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.6786034Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.6791067Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.6792422Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.6793287Z #22 400.3 ptxas info : Compile time = 0.718 ms 2025-09-07T06:30:57.6798250Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.6807283Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.6812265Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.6813431Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.6814272Z #22 400.3 ptxas info : Compile time = 0.702 ms 2025-09-07T06:30:57.6819214Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.6828290Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.6833687Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.6834679Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.6835518Z #22 400.3 ptxas info : Compile time = 0.642 ms 2025-09-07T06:30:57.6840323Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.6849067Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.6854136Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.6855504Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.6856337Z #22 400.3 ptxas info : Compile time = 0.703 ms 2025-09-07T06:30:57.6861122Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.6869780Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.6874623Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.6875669Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.6876517Z #22 400.3 ptxas info : Compile time = 0.618 ms 2025-09-07T06:30:57.6878998Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.6883088Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:57.6885649Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.6886689Z #22 400.3 ptxas info : Used 39 registers, used 0 barriers 2025-09-07T06:30:57.6887571Z #22 400.3 ptxas info : Compile time = 40.093 ms 2025-09-07T06:30:57.6993555Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7002957Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7008429Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7009445Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.7010297Z #22 400.3 ptxas info : Compile time = 0.933 ms 2025-09-07T06:30:57.7014922Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7023325Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:57.7027867Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7028891Z #22 400.3 ptxas info : Used 40 registers, used 1 barriers 2025-09-07T06:30:57.7029732Z #22 400.3 ptxas info : Compile time = 47.733 ms 2025-09-07T06:30:57.7034721Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7043861Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7048906Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7049919Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.7050784Z #22 400.3 ptxas info : Compile time = 0.926 ms 2025-09-07T06:30:57.7053469Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7057484Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:57.7060011Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7061019Z #22 400.3 ptxas info : Used 42 registers, used 0 barriers 2025-09-07T06:30:57.7061835Z #22 400.3 ptxas info : Compile time = 25.455 ms 2025-09-07T06:30:57.7066858Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7076227Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7081243Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7082276Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.7083139Z #22 400.3 ptxas info : Compile time = 0.946 ms 2025-09-07T06:30:57.7088362Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7097542Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7101984Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7102962Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.7103778Z #22 400.3 ptxas info : Compile time = 0.755 ms 2025-09-07T06:30:57.7108608Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7116690Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7121216Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7122273Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.7123135Z #22 400.3 ptxas info : Compile time = 0.697 ms 2025-09-07T06:30:57.7128079Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7137282Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7142693Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7143717Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.7144541Z #22 400.3 ptxas info : Compile time = 0.630 ms 2025-09-07T06:30:57.7149632Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7158882Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7163988Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7165001Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.7165875Z #22 400.3 ptxas info : Compile time = 0.657 ms 2025-09-07T06:30:57.7171190Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7180458Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7185537Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7186557Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.7187421Z #22 400.3 ptxas info : Compile time = 0.607 ms 2025-09-07T06:30:57.7192773Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7201839Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7206813Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7207831Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.7208664Z #22 400.3 ptxas info : Compile time = 0.613 ms 2025-09-07T06:30:57.7213843Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7223596Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7228586Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7229582Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.7230444Z #22 400.3 ptxas info : Compile time = 0.596 ms 2025-09-07T06:30:57.7235251Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7244387Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7249242Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7250268Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.7251136Z #22 400.3 ptxas info : Compile time = 0.619 ms 2025-09-07T06:30:57.7256119Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7264882Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7269767Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7270879Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.7271735Z #22 400.3 ptxas info : Compile time = 0.610 ms 2025-09-07T06:30:57.7274391Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7278286Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:57.7280165Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7280897Z #22 400.3 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:30:57.7281570Z #22 400.3 ptxas info : Compile time = 34.302 ms 2025-09-07T06:30:57.7285360Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7292536Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7296290Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7297068Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.7297734Z #22 400.3 ptxas info : Compile time = 0.936 ms 2025-09-07T06:30:57.7301505Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJS7_NS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7307736Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJS7_NS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:57.7311331Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7312134Z #22 400.3 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:30:57.7312800Z #22 400.3 ptxas info : Compile time = 27.544 ms 2025-09-07T06:30:57.7316651Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7323564Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7327421Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7328207Z #22 400.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:30:57.7328846Z #22 400.3 ptxas info : Compile time = 0.940 ms 2025-09-07T06:30:57.7330832Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:30:57.7334241Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:57.7336211Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7337359Z #22 400.3 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:30:57.7338029Z #22 400.3 ptxas info : Compile time = 38.673 ms 2025-09-07T06:30:57.7338655Z #22 400.3 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:30:57.7342518Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7349407Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7353230Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7354105Z #22 400.3 ptxas info : Used 249 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:30:57.7354850Z #22 400.3 ptxas info : Compile time = 441.561 ms 2025-09-07T06:30:57.7358872Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7365734Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7369530Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7370412Z #22 400.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:30:57.7371149Z #22 400.3 ptxas info : Compile time = 406.753 ms 2025-09-07T06:30:57.7375131Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7382015Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7385809Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7386694Z #22 400.3 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:30:57.7387444Z #22 400.3 ptxas info : Compile time = 487.429 ms 2025-09-07T06:30:57.7391245Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7398712Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7402238Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7403036Z #22 400.3 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:30:57.7403744Z #22 400.3 ptxas info : Compile time = 435.056 ms 2025-09-07T06:30:57.7407224Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7414925Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7418774Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7419696Z #22 400.3 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:30:57.7420496Z #22 400.3 ptxas info : Compile time = 531.418 ms 2025-09-07T06:30:57.7424578Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7430906Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7434625Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7435516Z #22 400.3 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:30:57.7436279Z #22 400.3 ptxas info : Compile time = 479.997 ms 2025-09-07T06:30:57.7439714Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7446502Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7453209Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7454169Z #22 400.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:30:57.7454926Z #22 400.3 ptxas info : Compile time = 521.123 ms 2025-09-07T06:30:57.7459133Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7465666Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7469460Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7470304Z #22 400.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:30:57.7471018Z #22 400.3 ptxas info : Compile time = 476.547 ms 2025-09-07T06:30:57.7474439Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7480623Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7484086Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7484934Z #22 400.3 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:30:57.7485659Z #22 400.3 ptxas info : Compile time = 469.096 ms 2025-09-07T06:30:57.7489029Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7495485Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7498867Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7499628Z #22 400.3 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:30:57.7500327Z #22 400.3 ptxas info : Compile time = 431.789 ms 2025-09-07T06:30:57.7502342Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7505163Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:57.7506951Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7507743Z #22 400.3 ptxas info : Used 39 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:30:57.7508440Z #22 400.3 ptxas info : Compile time = 21.480 ms 2025-09-07T06:30:57.7512054Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7518810Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7522366Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7523204Z #22 400.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:30:57.7523909Z #22 400.3 ptxas info : Compile time = 490.715 ms 2025-09-07T06:30:57.7527065Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7533054Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:57.7536279Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7537095Z #22 400.3 ptxas info : Used 40 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:30:57.7537825Z #22 400.3 ptxas info : Compile time = 15.534 ms 2025-09-07T06:30:57.7541375Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7547801Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7551582Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7552429Z #22 400.3 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:30:57.7553133Z #22 400.3 ptxas info : Compile time = 444.681 ms 2025-09-07T06:30:57.7554955Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7557827Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:57.7559631Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7560466Z #22 400.3 ptxas info : Used 45 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:30:57.7561191Z #22 400.3 ptxas info : Compile time = 23.534 ms 2025-09-07T06:30:57.7564965Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7571448Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7575192Z #22 400.3 96 bytes stack frame, 92 bytes spill stores, 148 bytes spill loads 2025-09-07T06:30:57.7576234Z #22 400.3 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:30:57.7577139Z #22 400.3 ptxas info : Compile time = 823.661 ms 2025-09-07T06:30:57.7580650Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7587104Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7590726Z #22 400.3 64 bytes stack frame, 56 bytes spill stores, 72 bytes spill loads 2025-09-07T06:30:57.7591754Z #22 400.3 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:30:57.7592977Z #22 400.3 ptxas info : Compile time = 752.429 ms 2025-09-07T06:30:57.7596532Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7603022Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7606936Z #22 400.3 72 bytes stack frame, 68 bytes spill stores, 72 bytes spill loads 2025-09-07T06:30:57.7607986Z #22 400.3 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:30:57.7608854Z #22 400.3 ptxas info : Compile time = 842.221 ms 2025-09-07T06:30:57.7612540Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7619323Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7622904Z #22 400.3 64 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:30:57.7623925Z #22 400.3 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:30:57.7624848Z #22 400.3 ptxas info : Compile time = 786.998 ms 2025-09-07T06:30:57.7628474Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7650160Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7654119Z #22 400.3 88 bytes stack frame, 92 bytes spill stores, 124 bytes spill loads 2025-09-07T06:30:57.7655149Z #22 400.3 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:30:57.7656060Z #22 400.3 ptxas info : Compile time = 893.847 ms 2025-09-07T06:30:57.7659636Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7666078Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7669930Z #22 400.3 88 bytes stack frame, 84 bytes spill stores, 112 bytes spill loads 2025-09-07T06:30:57.7670950Z #22 400.3 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:30:57.7671839Z #22 400.3 ptxas info : Compile time = 835.367 ms 2025-09-07T06:30:57.7675439Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7681865Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7685517Z #22 400.3 104 bytes stack frame, 104 bytes spill stores, 120 bytes spill loads 2025-09-07T06:30:57.7686773Z #22 400.3 ptxas info : Used 255 registers, used 2 barriers, 104 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:30:57.7687695Z #22 400.3 ptxas info : Compile time = 915.844 ms 2025-09-07T06:30:57.7691200Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7697994Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7701545Z #22 400.3 104 bytes stack frame, 104 bytes spill stores, 116 bytes spill loads 2025-09-07T06:30:57.7702593Z #22 400.3 ptxas info : Used 255 registers, used 2 barriers, 104 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:30:57.7703463Z #22 400.3 ptxas info : Compile time = 865.727 ms 2025-09-07T06:30:57.7706916Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7713150Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7716638Z #22 400.3 88 bytes stack frame, 88 bytes spill stores, 120 bytes spill loads 2025-09-07T06:30:57.7717658Z #22 400.3 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:30:57.7718554Z #22 400.3 ptxas info : Compile time = 822.110 ms 2025-09-07T06:30:57.7722356Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7728649Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7732090Z #22 400.3 96 bytes stack frame, 96 bytes spill stores, 120 bytes spill loads 2025-09-07T06:30:57.7733282Z #22 400.3 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:30:57.7734210Z #22 400.3 ptxas info : Compile time = 777.581 ms 2025-09-07T06:30:57.7736107Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7739461Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:57.7741385Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.7742215Z #22 400.3 ptxas info : Used 62 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:30:57.7742919Z #22 400.3 ptxas info : Compile time = 33.257 ms 2025-09-07T06:30:57.7746486Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.7752961Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.7756612Z #22 400.3 96 bytes stack frame, 96 bytes spill stores, 100 bytes spill loads 2025-09-07T06:30:57.8121902Z #22 400.3 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:30:57.8123549Z #22 400.3 ptxas info : Compile time = 843.697 ms 2025-09-07T06:30:57.8128098Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJS7_NS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.8135439Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJS7_NS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:30:57.8139103Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.8139915Z #22 400.3 ptxas info : Used 56 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:30:57.8140633Z #22 400.3 ptxas info : Compile time = 21.239 ms 2025-09-07T06:30:57.8144177Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.8150324Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:30:57.8153826Z #22 400.3 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads 2025-09-07T06:30:57.8155093Z #22 400.3 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:30:57.8155958Z #22 400.3 ptxas info : Compile time = 789.840 ms 2025-09-07T06:30:57.8157694Z #22 400.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:30:57.8160540Z #22 400.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:30:57.8162284Z #22 400.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:30:57.8163109Z #22 400.3 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:30:57.8163790Z #22 400.3 ptxas info : Compile time = 34.902 ms 2025-09-07T06:31:05.7250080Z #22 408.4 [30/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:05.7270507Z #22 408.4 ptxas info : 10 bytes gmem 2025-09-07T06:31:05.7275357Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:05.7284063Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7288915Z #22 408.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:05.7290229Z #22 408.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:05.7291025Z #22 408.4 ptxas info : Compile time = 2.137 ms 2025-09-07T06:31:05.7299339Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:05.7308688Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7313649Z #22 408.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:05.7314600Z #22 408.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:05.7315368Z #22 408.4 ptxas info : Compile time = 1.062 ms 2025-09-07T06:31:05.7320372Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:05.7329558Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7334596Z #22 408.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:05.7335589Z #22 408.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:05.7336436Z #22 408.4 ptxas info : Compile time = 21.304 ms 2025-09-07T06:31:05.7341602Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:05.7350065Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7354777Z #22 408.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:05.7355780Z #22 408.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:05.7356587Z #22 408.4 ptxas info : Compile time = 0.787 ms 2025-09-07T06:31:05.7361978Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:05.7371203Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7376481Z #22 408.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:05.7377461Z #22 408.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:05.7378298Z #22 408.4 ptxas info : Compile time = 0.716 ms 2025-09-07T06:31:05.7383427Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:05.7393214Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7399002Z #22 408.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:05.7409059Z #22 408.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:05.7409875Z #22 408.4 ptxas info : Compile time = 0.619 ms 2025-09-07T06:31:05.7414999Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:05.7425039Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7430934Z #22 408.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:05.7431922Z #22 408.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:05.7432722Z #22 408.4 ptxas info : Compile time = 0.648 ms 2025-09-07T06:31:05.7438420Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:05.7448255Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7453873Z #22 408.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:05.7454834Z #22 408.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:05.7455655Z #22 408.4 ptxas info : Compile time = 0.631 ms 2025-09-07T06:31:05.7460975Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:05.7470917Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7476359Z #22 408.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:05.7477349Z #22 408.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:05.7478135Z #22 408.4 ptxas info : Compile time = 0.657 ms 2025-09-07T06:31:05.7483447Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:05.7494116Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7499340Z #22 408.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:05.7500328Z #22 408.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:05.7501133Z #22 408.4 ptxas info : Compile time = 0.607 ms 2025-09-07T06:31:05.7507078Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:05.7517422Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7522849Z #22 408.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:05.7523835Z #22 408.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:05.7524628Z #22 408.4 ptxas info : Compile time = 0.616 ms 2025-09-07T06:31:05.7529997Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:05.7539948Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7545270Z #22 408.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:05.7546208Z #22 408.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:05.7547031Z #22 408.4 ptxas info : Compile time = 0.603 ms 2025-09-07T06:31:05.7547819Z #22 408.4 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:31:05.7552638Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:05.7561608Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7566533Z #22 408.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:05.7567742Z #22 408.4 ptxas info : Used 252 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:31:05.7568726Z #22 408.4 ptxas info : Compile time = 1053.648 ms 2025-09-07T06:31:05.7574157Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:05.7584033Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7589242Z #22 408.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:05.7590270Z #22 408.4 ptxas info : Used 254 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:31:05.7591214Z #22 408.4 ptxas info : Compile time = 1751.660 ms 2025-09-07T06:31:05.7596474Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:05.7606127Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7611218Z #22 408.4 216 bytes stack frame, 232 bytes spill stores, 332 bytes spill loads 2025-09-07T06:31:05.7612761Z #22 408.4 ptxas info : Used 255 registers, used 6 barriers, 216 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:31:05.7614494Z #22 408.4 ptxas info : Compile time = 2143.336 ms 2025-09-07T06:31:05.7620160Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:05.7629481Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7634657Z #22 408.4 40 bytes stack frame, 52 bytes spill stores, 68 bytes spill loads 2025-09-07T06:31:05.7636029Z #22 408.4 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:31:05.7637271Z #22 408.4 ptxas info : Compile time = 1198.719 ms 2025-09-07T06:31:05.7643383Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:05.7654758Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7660373Z #22 408.4 176 bytes stack frame, 192 bytes spill stores, 332 bytes spill loads 2025-09-07T06:31:05.7661745Z #22 408.4 ptxas info : Used 255 registers, used 6 barriers, 176 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:31:05.7662962Z #22 408.4 ptxas info : Compile time = 1485.133 ms 2025-09-07T06:31:05.7668431Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:05.7678319Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7683719Z #22 408.4 248 bytes stack frame, 340 bytes spill stores, 544 bytes spill loads 2025-09-07T06:31:05.7685129Z #22 408.4 ptxas info : Used 255 registers, used 6 barriers, 248 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:31:05.7686349Z #22 408.4 ptxas info : Compile time = 2891.701 ms 2025-09-07T06:31:05.7691393Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:05.7701271Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7706555Z #22 408.4 112 bytes stack frame, 156 bytes spill stores, 196 bytes spill loads 2025-09-07T06:31:05.7707962Z #22 408.4 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:31:05.7709191Z #22 408.4 ptxas info : Compile time = 2482.646 ms 2025-09-07T06:31:05.7714614Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:05.7724821Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7730321Z #22 408.4 48 bytes stack frame, 64 bytes spill stores, 76 bytes spill loads 2025-09-07T06:31:05.7731681Z #22 408.4 ptxas info : Used 255 registers, used 6 barriers, 48 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:31:05.7733865Z #22 408.4 ptxas info : Compile time = 3296.487 ms 2025-09-07T06:31:05.7739261Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:05.7750583Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7756032Z #22 408.4 112 bytes stack frame, 204 bytes spill stores, 296 bytes spill loads 2025-09-07T06:31:05.7757446Z #22 408.4 ptxas info : Used 255 registers, used 6 barriers, 112 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:31:05.7758679Z #22 408.4 ptxas info : Compile time = 5946.956 ms 2025-09-07T06:31:05.7763781Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:05.7773635Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb1ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7779010Z #22 408.4 272 bytes stack frame, 572 bytes spill stores, 700 bytes spill loads 2025-09-07T06:31:05.7780409Z #22 408.4 ptxas info : Used 255 registers, used 6 barriers, 272 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:31:05.7781634Z #22 408.4 ptxas info : Compile time = 2764.480 ms 2025-09-07T06:31:05.7787162Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:05.7797802Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7803230Z #22 408.4 256 bytes stack frame, 376 bytes spill stores, 652 bytes spill loads 2025-09-07T06:31:05.7804647Z #22 408.4 ptxas info : Used 255 registers, used 6 barriers, 256 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:31:05.7805888Z #22 408.4 ptxas info : Compile time = 2858.957 ms 2025-09-07T06:31:05.7811305Z #22 408.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:05.7821504Z #22 408.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi112EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb1EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi256ELb1ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:05.7826986Z #22 408.4 288 bytes stack frame, 428 bytes spill stores, 764 bytes spill loads 2025-09-07T06:31:05.7828364Z #22 408.4 ptxas info : Used 255 registers, used 6 barriers, 288 bytes cumulative stack size, 1408 bytes cmem[0] 2025-09-07T06:31:05.7829590Z #22 408.4 ptxas info : Compile time = 4744.746 ms 2025-09-07T06:31:06.1913967Z #22 408.9 [31/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_packgqa_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_packgqa_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_packgqa_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:06.3447333Z #22 408.9 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:31:06.3453073Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:06.3462640Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3467823Z #22 408.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:06.3468931Z #22 408.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:06.3469871Z #22 408.9 ptxas info : Compile time = 1.869 ms 2025-09-07T06:31:06.3475356Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:06.3485583Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3490891Z #22 408.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:06.3492323Z #22 408.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:06.3493653Z #22 408.9 ptxas info : Compile time = 0.870 ms 2025-09-07T06:31:06.3498987Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:06.3509733Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3515577Z #22 408.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:06.3516719Z #22 408.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:06.3517694Z #22 408.9 ptxas info : Compile time = 0.703 ms 2025-09-07T06:31:06.3523882Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:06.3534821Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3540226Z #22 408.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:06.3541198Z #22 408.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:06.3542140Z #22 408.9 ptxas info : Compile time = 0.795 ms 2025-09-07T06:31:06.3547302Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:06.3557084Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3562476Z #22 408.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:06.3563537Z #22 408.9 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:31:06.3564692Z #22 408.9 ptxas info : Compile time = 0.559 ms 2025-09-07T06:31:06.3570165Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:06.3580712Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3586107Z #22 408.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:06.3587134Z #22 408.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:06.3588027Z #22 408.9 ptxas info : Compile time = 0.562 ms 2025-09-07T06:31:06.3593560Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:06.3601290Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3605527Z #22 408.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:06.3606371Z #22 408.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:06.3607081Z #22 408.9 ptxas info : Compile time = 0.626 ms 2025-09-07T06:31:06.3611089Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:06.3618597Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3622657Z #22 408.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:06.3623707Z #22 408.9 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:31:06.3624430Z #22 408.9 ptxas info : Compile time = 0.551 ms 2025-09-07T06:31:06.3628601Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:06.3636526Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3640764Z #22 408.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:06.3641624Z #22 408.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:06.3642341Z #22 408.9 ptxas info : Compile time = 0.552 ms 2025-09-07T06:31:06.3646847Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:06.3654906Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3659228Z #22 408.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:06.3660075Z #22 408.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:31:06.3660830Z #22 408.9 ptxas info : Compile time = 0.531 ms 2025-09-07T06:31:06.3661379Z #22 408.9 ptxas info : 10 bytes gmem 2025-09-07T06:31:06.3665345Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:06.3672660Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3676738Z #22 408.9 24 bytes stack frame, 52 bytes spill stores, 56 bytes spill loads 2025-09-07T06:31:06.3677864Z #22 408.9 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:31:06.3678698Z #22 408.9 ptxas info : Compile time = 682.106 ms 2025-09-07T06:31:06.3682780Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:06.3690380Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3695226Z #22 408.9 32 bytes stack frame, 100 bytes spill stores, 104 bytes spill loads 2025-09-07T06:31:06.3696249Z #22 408.9 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:31:06.3697216Z #22 408.9 ptxas info : Compile time = 700.308 ms 2025-09-07T06:31:06.3702530Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:06.3711872Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3716966Z #22 408.9 56 bytes stack frame, 204 bytes spill stores, 220 bytes spill loads 2025-09-07T06:31:06.3718025Z #22 408.9 ptxas info : Used 168 registers, used 9 barriers, 56 bytes cumulative stack size 2025-09-07T06:31:06.3719009Z #22 408.9 ptxas info : Compile time = 861.511 ms 2025-09-07T06:31:06.3724138Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:06.3733675Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3738717Z #22 408.9 64 bytes stack frame, 276 bytes spill stores, 316 bytes spill loads 2025-09-07T06:31:06.3740108Z #22 408.9 ptxas info : Used 168 registers, used 16 barriers, 64 bytes cumulative stack size 2025-09-07T06:31:06.3741080Z #22 408.9 ptxas info : Compile time = 1723.630 ms 2025-09-07T06:31:06.3746095Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:06.3755421Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3760419Z #22 408.9 8 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads 2025-09-07T06:31:06.3761512Z #22 408.9 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:06.3762461Z #22 408.9 ptxas info : Compile time = 1218.919 ms 2025-09-07T06:31:06.3767800Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:06.3777434Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3782620Z #22 408.9 48 bytes stack frame, 100 bytes spill stores, 128 bytes spill loads 2025-09-07T06:31:06.3783769Z #22 408.9 ptxas info : Used 168 registers, used 9 barriers, 48 bytes cumulative stack size 2025-09-07T06:31:06.3784736Z #22 408.9 ptxas info : Compile time = 1451.971 ms 2025-09-07T06:31:06.3789872Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:06.3799628Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3804998Z #22 408.9 56 bytes stack frame, 300 bytes spill stores, 344 bytes spill loads 2025-09-07T06:31:06.3806118Z #22 408.9 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:31:06.3807094Z #22 408.9 ptxas info : Compile time = 2630.339 ms 2025-09-07T06:31:06.3812119Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:06.3821166Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3825773Z #22 408.9 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:06.3826896Z #22 408.9 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:31:06.3827881Z #22 408.9 ptxas info : Compile time = 1115.673 ms 2025-09-07T06:31:06.3833334Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:06.3842875Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3848111Z #22 408.9 32 bytes stack frame, 120 bytes spill stores, 148 bytes spill loads 2025-09-07T06:31:06.3849294Z #22 408.9 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:31:06.3850283Z #22 408.9 ptxas info : Compile time = 1250.907 ms 2025-09-07T06:31:06.3855627Z #22 408.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:06.3865103Z #22 408.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:06.3870573Z #22 408.9 48 bytes stack frame, 232 bytes spill stores, 276 bytes spill loads 2025-09-07T06:31:06.3871827Z #22 408.9 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:31:06.3872854Z #22 408.9 ptxas info : Compile time = 2359.371 ms 2025-09-07T06:31:21.0184306Z #22 423.7 [32/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_bf16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_bf16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:21.0204978Z #22 423.7 ptxas info : 10 bytes gmem 2025-09-07T06:31:21.0210249Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0219982Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0225309Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0226359Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0227226Z #22 423.7 ptxas info : Compile time = 2.175 ms 2025-09-07T06:31:21.0232557Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0242453Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0247873Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0248916Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0249752Z #22 423.7 ptxas info : Compile time = 1.045 ms 2025-09-07T06:31:21.0255253Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0264928Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0270186Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0271567Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0272400Z #22 423.7 ptxas info : Compile time = 0.751 ms 2025-09-07T06:31:21.0277685Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0287350Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0300376Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0301446Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0302255Z #22 423.7 ptxas info : Compile time = 0.657 ms 2025-09-07T06:31:21.0307550Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0317105Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0322727Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0323784Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0324597Z #22 423.7 ptxas info : Compile time = 0.651 ms 2025-09-07T06:31:21.0329951Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0340007Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0345363Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0346358Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0347230Z #22 423.7 ptxas info : Compile time = 0.618 ms 2025-09-07T06:31:21.0352820Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0362470Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0367784Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0368796Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0369639Z #22 423.7 ptxas info : Compile time = 0.623 ms 2025-09-07T06:31:21.0375098Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0384724Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0389984Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0390996Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0391868Z #22 423.7 ptxas info : Compile time = 0.610 ms 2025-09-07T06:31:21.0397213Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0406797Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0412055Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0413177Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0414039Z #22 423.7 ptxas info : Compile time = 0.731 ms 2025-09-07T06:31:21.0419189Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0429304Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0434435Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0435789Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0437022Z #22 423.7 ptxas info : Compile time = 0.602 ms 2025-09-07T06:31:21.0443005Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0452904Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0458257Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0459238Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0460038Z #22 423.7 ptxas info : Compile time = 0.620 ms 2025-09-07T06:31:21.0465298Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0474852Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0480372Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0481487Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0482342Z #22 423.7 ptxas info : Compile time = 0.610 ms 2025-09-07T06:31:21.0487709Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0497667Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0502996Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0504043Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0504862Z #22 423.7 ptxas info : Compile time = 0.619 ms 2025-09-07T06:31:21.0510347Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0520022Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0525342Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0526379Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0527239Z #22 423.7 ptxas info : Compile time = 0.637 ms 2025-09-07T06:31:21.0532682Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0542111Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0547455Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0548502Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0549352Z #22 423.7 ptxas info : Compile time = 0.619 ms 2025-09-07T06:31:21.0554813Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0564478Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0569772Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0570781Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0571642Z #22 423.7 ptxas info : Compile time = 0.608 ms 2025-09-07T06:31:21.0577091Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0586958Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0592517Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0593527Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0594402Z #22 423.7 ptxas info : Compile time = 20.809 ms 2025-09-07T06:31:21.0599687Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0609261Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0614781Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0615816Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0616666Z #22 423.7 ptxas info : Compile time = 0.938 ms 2025-09-07T06:31:21.0621933Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0631754Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0637225Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0638250Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0639116Z #22 423.7 ptxas info : Compile time = 0.797 ms 2025-09-07T06:31:21.0644381Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0654170Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0659443Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0660721Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0661540Z #22 423.7 ptxas info : Compile time = 0.694 ms 2025-09-07T06:31:21.0666663Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0676054Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0681127Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0682178Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0683045Z #22 423.7 ptxas info : Compile time = 0.641 ms 2025-09-07T06:31:21.0688155Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0697793Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0703061Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0704085Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0704942Z #22 423.7 ptxas info : Compile time = 0.623 ms 2025-09-07T06:31:21.0707620Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0712105Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:21.0714773Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0715814Z #22 423.7 ptxas info : Used 44 registers, used 0 barriers 2025-09-07T06:31:21.0716663Z #22 423.7 ptxas info : Compile time = 23.495 ms 2025-09-07T06:31:21.0721972Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0731812Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0737293Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0738340Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0739173Z #22 423.7 ptxas info : Compile time = 0.982 ms 2025-09-07T06:31:21.0744115Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0752977Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:21.0757895Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0758963Z #22 423.7 ptxas info : Used 68 registers, used 1 barriers 2025-09-07T06:31:21.0759831Z #22 423.7 ptxas info : Compile time = 34.969 ms 2025-09-07T06:31:21.0764536Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEESO_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0773330Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEESO_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:21.0778285Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0779403Z #22 423.7 ptxas info : Used 44 registers, used 1 barriers 2025-09-07T06:31:21.0780280Z #22 423.7 ptxas info : Compile time = 21.624 ms 2025-09-07T06:31:21.0785610Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0795527Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0800781Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0801790Z #22 423.7 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:21.0802641Z #22 423.7 ptxas info : Compile time = 1.044 ms 2025-09-07T06:31:21.0805610Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:21.0810044Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:21.0812878Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.0813920Z #22 423.7 ptxas info : Used 50 registers, used 0 barriers 2025-09-07T06:31:21.0814787Z #22 423.7 ptxas info : Compile time = 26.733 ms 2025-09-07T06:31:21.0815640Z #22 423.7 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:31:21.0820916Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.0830498Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0835901Z #22 423.7 112 bytes stack frame, 104 bytes spill stores, 160 bytes spill loads 2025-09-07T06:31:21.0837424Z #22 423.7 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:21.0838705Z #22 423.7 ptxas info : Compile time = 658.264 ms 2025-09-07T06:31:21.0844044Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.0853955Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0859356Z #22 423.7 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:31:21.0860807Z #22 423.7 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:21.0862089Z #22 423.7 ptxas info : Compile time = 617.565 ms 2025-09-07T06:31:21.0867397Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.0877193Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0882528Z #22 423.7 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:31:21.0883981Z #22 423.7 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:21.0885268Z #22 423.7 ptxas info : Compile time = 667.570 ms 2025-09-07T06:31:21.0890596Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.0900633Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0905855Z #22 423.7 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:31:21.0907314Z #22 423.7 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:21.0908547Z #22 423.7 ptxas info : Compile time = 618.185 ms 2025-09-07T06:31:21.0913883Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.0923612Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0929155Z #22 423.7 80 bytes stack frame, 76 bytes spill stores, 100 bytes spill loads 2025-09-07T06:31:21.0930583Z #22 423.7 ptxas info : Used 255 registers, used 2 barriers, 80 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:21.0931870Z #22 423.7 ptxas info : Compile time = 710.066 ms 2025-09-07T06:31:21.0937353Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.0947007Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0952290Z #22 423.7 72 bytes stack frame, 72 bytes spill stores, 96 bytes spill loads 2025-09-07T06:31:21.0953972Z #22 423.7 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:21.0955240Z #22 423.7 ptxas info : Compile time = 663.387 ms 2025-09-07T06:31:21.0960571Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.0970310Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0975728Z #22 423.7 72 bytes stack frame, 76 bytes spill stores, 80 bytes spill loads 2025-09-07T06:31:21.0977180Z #22 423.7 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:21.0978423Z #22 423.7 ptxas info : Compile time = 707.494 ms 2025-09-07T06:31:21.0983677Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.0993597Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.0998974Z #22 423.7 72 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads 2025-09-07T06:31:21.1000408Z #22 423.7 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:21.1001793Z #22 423.7 ptxas info : Compile time = 671.860 ms 2025-09-07T06:31:21.1006905Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1015498Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.1020641Z #22 423.7 40 bytes stack frame, 40 bytes spill stores, 72 bytes spill loads 2025-09-07T06:31:21.1022122Z #22 423.7 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:21.1023383Z #22 423.7 ptxas info : Compile time = 774.100 ms 2025-09-07T06:31:21.1028820Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1038000Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.1043147Z #22 423.7 40 bytes stack frame, 40 bytes spill stores, 64 bytes spill loads 2025-09-07T06:31:21.1044553Z #22 423.7 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:21.1045854Z #22 423.7 ptxas info : Compile time = 628.578 ms 2025-09-07T06:31:21.1051156Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1060977Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.1066286Z #22 423.7 40 bytes stack frame, 36 bytes spill stores, 36 bytes spill loads 2025-09-07T06:31:21.1067749Z #22 423.7 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:21.1068988Z #22 423.7 ptxas info : Compile time = 683.394 ms 2025-09-07T06:31:21.1074372Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1084060Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.1089415Z #22 423.7 40 bytes stack frame, 36 bytes spill stores, 36 bytes spill loads 2025-09-07T06:31:21.1090863Z #22 423.7 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:21.1092460Z #22 423.7 ptxas info : Compile time = 654.740 ms 2025-09-07T06:31:21.1097875Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1107718Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.1112966Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.1114115Z #22 423.7 ptxas info : Used 250 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:21.1115130Z #22 423.7 ptxas info : Compile time = 551.086 ms 2025-09-07T06:31:21.1120444Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1130062Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.1135525Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.1136642Z #22 423.7 ptxas info : Used 240 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:21.1137660Z #22 423.7 ptxas info : Compile time = 536.055 ms 2025-09-07T06:31:21.1142976Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1152823Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.1158251Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.1159437Z #22 423.7 ptxas info : Used 246 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:21.1160449Z #22 423.7 ptxas info : Compile time = 583.291 ms 2025-09-07T06:31:21.1165705Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1175575Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.1180902Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.1182030Z #22 423.7 ptxas info : Used 246 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:21.1183042Z #22 423.7 ptxas info : Compile time = 550.539 ms 2025-09-07T06:31:21.1188307Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1198261Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.1203590Z #22 423.7 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:31:21.1205024Z #22 423.7 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:21.1206323Z #22 423.7 ptxas info : Compile time = 694.995 ms 2025-09-07T06:31:21.1211649Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1221440Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.1226902Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.1228184Z #22 423.7 ptxas info : Used 253 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:21.1229191Z #22 423.7 ptxas info : Compile time = 592.409 ms 2025-09-07T06:31:21.1234511Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1244159Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.1249476Z #22 423.7 24 bytes stack frame, 28 bytes spill stores, 32 bytes spill loads 2025-09-07T06:31:21.1250920Z #22 423.7 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:21.1252228Z #22 423.7 ptxas info : Compile time = 728.018 ms 2025-09-07T06:31:21.1258090Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1267740Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.1273046Z #22 423.7 24 bytes stack frame, 28 bytes spill stores, 28 bytes spill loads 2025-09-07T06:31:21.1274519Z #22 423.7 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:21.1275771Z #22 423.7 ptxas info : Compile time = 687.476 ms 2025-09-07T06:31:21.1280916Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1290250Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.1295800Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.1297125Z #22 423.7 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:21.1298062Z #22 423.7 ptxas info : Compile time = 598.517 ms 2025-09-07T06:31:21.1303180Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1312663Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.1317737Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.1318906Z #22 423.7 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:21.1319906Z #22 423.7 ptxas info : Compile time = 544.893 ms 2025-09-07T06:31:21.1322598Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1327143Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:21.1329817Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.1330987Z #22 423.7 ptxas info : Used 47 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:21.1331963Z #22 423.7 ptxas info : Compile time = 18.864 ms 2025-09-07T06:31:21.1337311Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1346998Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.1352192Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.1353374Z #22 423.7 ptxas info : Used 253 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:21.1354302Z #22 423.7 ptxas info : Compile time = 601.182 ms 2025-09-07T06:31:21.1359226Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1368072Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:21.1373340Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.1374580Z #22 423.7 ptxas info : Used 71 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:21.1375593Z #22 423.7 ptxas info : Compile time = 24.368 ms 2025-09-07T06:31:21.1380408Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEESO_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1389131Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEESO_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:21.1394074Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.1395263Z #22 423.7 ptxas info : Used 47 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:21.1396273Z #22 423.7 ptxas info : Compile time = 15.518 ms 2025-09-07T06:31:21.1807903Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1815783Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:21.1820022Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.1821224Z #22 423.7 ptxas info : Used 253 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:21.1822044Z #22 423.7 ptxas info : Compile time = 566.160 ms 2025-09-07T06:31:21.1824079Z #22 423.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:21.1827450Z #22 423.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:21.1829831Z #22 423.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:21.1830727Z #22 423.7 ptxas info : Used 52 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:21.1831540Z #22 423.7 ptxas info : Compile time = 20.565 ms 2025-09-07T06:31:25.8808536Z #22 428.6 [33/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_fp16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_fp16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:25.8826069Z #22 428.6 ptxas info : 10 bytes gmem 2025-09-07T06:31:25.8830546Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.8838684Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.8843305Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.8844190Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.8844963Z #22 428.6 ptxas info : Compile time = 2.203 ms 2025-09-07T06:31:25.8849413Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.8857528Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.8861574Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.8862377Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.8863001Z #22 428.6 ptxas info : Compile time = 1.048 ms 2025-09-07T06:31:25.8867108Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.8874776Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.8878773Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.8879597Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.8880252Z #22 428.6 ptxas info : Compile time = 0.717 ms 2025-09-07T06:31:25.8884187Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.8891648Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.8896309Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.8897168Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.8897887Z #22 428.6 ptxas info : Compile time = 0.670 ms 2025-09-07T06:31:25.8902210Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.8910088Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.8914018Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.8914846Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.8915538Z #22 428.6 ptxas info : Compile time = 0.671 ms 2025-09-07T06:31:25.8919861Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.8932623Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.8936908Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.8937640Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.8938257Z #22 428.6 ptxas info : Compile time = 0.633 ms 2025-09-07T06:31:25.8942357Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.8950649Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.8954937Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.8955833Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.8956937Z #22 428.6 ptxas info : Compile time = 0.703 ms 2025-09-07T06:31:25.8961030Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.8968506Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.8972895Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.8973768Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.8974435Z #22 428.6 ptxas info : Compile time = 0.622 ms 2025-09-07T06:31:25.8978477Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.8984948Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.8988401Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.8989359Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.8989974Z #22 428.6 ptxas info : Compile time = 0.682 ms 2025-09-07T06:31:25.8993671Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9000664Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9004721Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9005575Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.9006275Z #22 428.6 ptxas info : Compile time = 0.679 ms 2025-09-07T06:31:25.9008401Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9011853Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:25.9014377Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9015258Z #22 428.6 ptxas info : Used 56 registers, used 0 barriers 2025-09-07T06:31:25.9015963Z #22 428.6 ptxas info : Compile time = 31.912 ms 2025-09-07T06:31:25.9020151Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9027708Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9031897Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9032748Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.9033455Z #22 428.6 ptxas info : Compile time = 0.994 ms 2025-09-07T06:31:25.9037123Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi8EEESH_EEENS4_IJNS5_ILi0EEESH_SK_EEEEENS4_IJNS5_ILi16EEENS5_ILi128EEESN_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9043639Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi8EEESH_EEENS4_IJNS5_ILi0EEESH_SK_EEEEENS4_IJNS5_ILi16EEENS5_ILi128EEESN_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:25.9047515Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9048336Z #22 428.6 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:31:25.9049050Z #22 428.6 ptxas info : Compile time = 28.279 ms 2025-09-07T06:31:25.9053461Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9060954Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9065123Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9065975Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.9066672Z #22 428.6 ptxas info : Compile time = 1.021 ms 2025-09-07T06:31:25.9068774Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9072479Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:25.9074569Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9075420Z #22 428.6 ptxas info : Used 60 registers, used 0 barriers 2025-09-07T06:31:25.9076119Z #22 428.6 ptxas info : Compile time = 34.533 ms 2025-09-07T06:31:25.9080177Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9087745Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9091806Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9092998Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.9093606Z #22 428.6 ptxas info : Compile time = 1.047 ms 2025-09-07T06:31:25.9097042Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9103234Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9106944Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9107800Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.9108382Z #22 428.6 ptxas info : Compile time = 0.814 ms 2025-09-07T06:31:25.9111814Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9118038Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9121491Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9122221Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.9122824Z #22 428.6 ptxas info : Compile time = 0.733 ms 2025-09-07T06:31:25.9126717Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9134254Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9138384Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9139222Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.9139943Z #22 428.6 ptxas info : Compile time = 0.666 ms 2025-09-07T06:31:25.9144027Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9151325Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9155367Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9156192Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.9156861Z #22 428.6 ptxas info : Compile time = 0.681 ms 2025-09-07T06:31:25.9161087Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9168641Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9172944Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9173810Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.9174529Z #22 428.6 ptxas info : Compile time = 0.651 ms 2025-09-07T06:31:25.9178606Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9186178Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9190238Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9191071Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.9191780Z #22 428.6 ptxas info : Compile time = 0.643 ms 2025-09-07T06:31:25.9196175Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9203565Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9207696Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9208564Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.9209277Z #22 428.6 ptxas info : Compile time = 0.648 ms 2025-09-07T06:31:25.9214028Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9221151Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9226210Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9227206Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.9318835Z #22 428.6 ptxas info : Compile time = 0.634 ms 2025-09-07T06:31:25.9322275Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9328329Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9331726Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9332645Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.9333261Z #22 428.6 ptxas info : Compile time = 0.664 ms 2025-09-07T06:31:25.9335372Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9338275Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:25.9340102Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9340822Z #22 428.6 ptxas info : Used 106 registers, used 0 barriers 2025-09-07T06:31:25.9341452Z #22 428.6 ptxas info : Compile time = 100.205 ms 2025-09-07T06:31:25.9344890Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9351162Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9354675Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9355395Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.9356007Z #22 428.6 ptxas info : Compile time = 1.030 ms 2025-09-07T06:31:25.9359825Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9365756Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:25.9382716Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9383781Z #22 428.6 ptxas info : Used 86 registers, used 1 barriers 2025-09-07T06:31:25.9384490Z #22 428.6 ptxas info : Compile time = 50.117 ms 2025-09-07T06:31:25.9388525Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9396160Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9400317Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9401162Z #22 428.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:25.9402163Z #22 428.6 ptxas info : Compile time = 1.019 ms 2025-09-07T06:31:25.9404268Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:25.9407722Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:25.9409923Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9410746Z #22 428.6 ptxas info : Used 96 registers, used 0 barriers 2025-09-07T06:31:25.9411468Z #22 428.6 ptxas info : Compile time = 71.612 ms 2025-09-07T06:31:25.9412171Z #22 428.6 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:31:25.9416535Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9424232Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9428335Z #22 428.6 184 bytes stack frame, 188 bytes spill stores, 232 bytes spill loads 2025-09-07T06:31:25.9429527Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 184 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9430570Z #22 428.6 ptxas info : Compile time = 552.133 ms 2025-09-07T06:31:25.9434628Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9442445Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9446651Z #22 428.6 176 bytes stack frame, 184 bytes spill stores, 224 bytes spill loads 2025-09-07T06:31:25.9447885Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 176 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9448923Z #22 428.6 ptxas info : Compile time = 507.766 ms 2025-09-07T06:31:25.9453276Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9461040Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9465247Z #22 428.6 176 bytes stack frame, 188 bytes spill stores, 200 bytes spill loads 2025-09-07T06:31:25.9466453Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 176 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9467518Z #22 428.6 ptxas info : Compile time = 616.541 ms 2025-09-07T06:31:25.9471708Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9479117Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9483281Z #22 428.6 176 bytes stack frame, 188 bytes spill stores, 196 bytes spill loads 2025-09-07T06:31:25.9484484Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 176 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9485527Z #22 428.6 ptxas info : Compile time = 510.123 ms 2025-09-07T06:31:25.9489751Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9497884Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9502364Z #22 428.6 208 bytes stack frame, 220 bytes spill stores, 272 bytes spill loads 2025-09-07T06:31:25.9503559Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 208 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9504591Z #22 428.6 ptxas info : Compile time = 648.630 ms 2025-09-07T06:31:25.9508707Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9516220Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9520698Z #22 428.6 208 bytes stack frame, 220 bytes spill stores, 264 bytes spill loads 2025-09-07T06:31:25.9521880Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 208 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9522916Z #22 428.6 ptxas info : Compile time = 529.266 ms 2025-09-07T06:31:25.9526986Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9534594Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9538809Z #22 428.6 216 bytes stack frame, 216 bytes spill stores, 232 bytes spill loads 2025-09-07T06:31:25.9540044Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 216 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9541066Z #22 428.6 ptxas info : Compile time = 643.536 ms 2025-09-07T06:31:25.9545226Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9552747Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9557150Z #22 428.6 216 bytes stack frame, 216 bytes spill stores, 228 bytes spill loads 2025-09-07T06:31:25.9558324Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 216 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9559459Z #22 428.6 ptxas info : Compile time = 549.245 ms 2025-09-07T06:31:25.9563470Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9570708Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9574982Z #22 428.6 216 bytes stack frame, 224 bytes spill stores, 264 bytes spill loads 2025-09-07T06:31:25.9576207Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 216 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9577213Z #22 428.6 ptxas info : Compile time = 597.863 ms 2025-09-07T06:31:25.9581482Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9588695Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9592970Z #22 428.6 216 bytes stack frame, 224 bytes spill stores, 256 bytes spill loads 2025-09-07T06:31:25.9594200Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 216 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9595235Z #22 428.6 ptxas info : Compile time = 500.765 ms 2025-09-07T06:31:25.9597307Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9600783Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:25.9602936Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9603902Z #22 428.6 ptxas info : Used 60 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:25.9604733Z #22 428.6 ptxas info : Compile time = 25.476 ms 2025-09-07T06:31:25.9608833Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9616660Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9621048Z #22 428.6 208 bytes stack frame, 208 bytes spill stores, 220 bytes spill loads 2025-09-07T06:31:25.9622262Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 208 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9623289Z #22 428.6 ptxas info : Compile time = 603.920 ms 2025-09-07T06:31:25.9626933Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi8EEESH_EEENS4_IJNS5_ILi0EEESH_SK_EEEEENS4_IJNS5_ILi16EEENS5_ILi128EEESN_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9633436Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi8EEESH_EEENS4_IJNS5_ILi0EEESH_SK_EEEEENS4_IJNS5_ILi16EEENS5_ILi128EEESN_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:25.9637336Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9638308Z #22 428.6 ptxas info : Used 51 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:25.9639123Z #22 428.6 ptxas info : Compile time = 20.826 ms 2025-09-07T06:31:25.9643297Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9650778Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9655206Z #22 428.6 200 bytes stack frame, 200 bytes spill stores, 208 bytes spill loads 2025-09-07T06:31:25.9656417Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 200 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9657434Z #22 428.6 ptxas info : Compile time = 511.121 ms 2025-09-07T06:31:25.9659496Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9662904Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:25.9665044Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9665952Z #22 428.6 ptxas info : Used 63 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:25.9666762Z #22 428.6 ptxas info : Compile time = 30.072 ms 2025-09-07T06:31:25.9670850Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9678204Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9682279Z #22 428.6 24 bytes stack frame, 20 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:25.9683418Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9684431Z #22 428.6 ptxas info : Compile time = 729.479 ms 2025-09-07T06:31:25.9688463Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9696435Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9700485Z #22 428.6 56 bytes stack frame, 52 bytes spill stores, 76 bytes spill loads 2025-09-07T06:31:25.9701629Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9702671Z #22 428.6 ptxas info : Compile time = 657.678 ms 2025-09-07T06:31:25.9706698Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9714043Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9718059Z #22 428.6 56 bytes stack frame, 56 bytes spill stores, 60 bytes spill loads 2025-09-07T06:31:25.9719237Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9720249Z #22 428.6 ptxas info : Compile time = 786.366 ms 2025-09-07T06:31:25.9724283Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9732758Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9737282Z #22 428.6 56 bytes stack frame, 56 bytes spill stores, 56 bytes spill loads 2025-09-07T06:31:25.9738893Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9740008Z #22 428.6 ptxas info : Compile time = 677.252 ms 2025-09-07T06:31:25.9745240Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9753629Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9758219Z #22 428.6 56 bytes stack frame, 56 bytes spill stores, 60 bytes spill loads 2025-09-07T06:31:25.9759521Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9760690Z #22 428.6 ptxas info : Compile time = 842.559 ms 2025-09-07T06:31:25.9765195Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9772101Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9776186Z #22 428.6 56 bytes stack frame, 56 bytes spill stores, 60 bytes spill loads 2025-09-07T06:31:25.9777462Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9778333Z #22 428.6 ptxas info : Compile time = 760.125 ms 2025-09-07T06:31:25.9782331Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9789471Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9793591Z #22 428.6 64 bytes stack frame, 68 bytes spill stores, 72 bytes spill loads 2025-09-07T06:31:25.9794883Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9796175Z #22 428.6 ptxas info : Compile time = 872.383 ms 2025-09-07T06:31:25.9801322Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9807989Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9811441Z #22 428.6 64 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:31:25.9812570Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9813457Z #22 428.6 ptxas info : Compile time = 620.564 ms 2025-09-07T06:31:25.9817119Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9823106Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9826484Z #22 428.6 72 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:31:25.9827493Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9828400Z #22 428.6 ptxas info : Compile time = 666.413 ms 2025-09-07T06:31:25.9831698Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9837717Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9841687Z #22 428.6 64 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads 2025-09-07T06:31:25.9842677Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9843557Z #22 428.6 ptxas info : Compile time = 664.334 ms 2025-09-07T06:31:25.9845508Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9848372Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:25.9850270Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9851073Z #22 428.6 ptxas info : Used 105 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:25.9851791Z #22 428.6 ptxas info : Compile time = 49.076 ms 2025-09-07T06:31:25.9855376Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9861583Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9865198Z #22 428.6 56 bytes stack frame, 60 bytes spill stores, 64 bytes spill loads 2025-09-07T06:31:25.9866214Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9867646Z #22 428.6 ptxas info : Compile time = 713.469 ms 2025-09-07T06:31:25.9870826Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9876520Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:25.9879713Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9880515Z #22 428.6 ptxas info : Used 86 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:25.9881231Z #22 428.6 ptxas info : Compile time = 28.872 ms 2025-09-07T06:31:25.9884649Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9890861Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:25.9894883Z #22 428.6 56 bytes stack frame, 60 bytes spill stores, 60 bytes spill loads 2025-09-07T06:31:25.9895883Z #22 428.6 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:25.9896756Z #22 428.6 ptxas info : Compile time = 453.274 ms 2025-09-07T06:31:25.9898705Z #22 428.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:25.9901603Z #22 428.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:25.9903413Z #22 428.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:25.9904221Z #22 428.6 ptxas info : Used 103 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:25.9904930Z #22 428.6 ptxas info : Compile time = 38.776 ms 2025-09-07T06:31:26.4579659Z #22 429.1 [34/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_fp16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_fp16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim96_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:26.4597284Z #22 429.1 ptxas info : 10 bytes gmem 2025-09-07T06:31:26.4602031Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4610174Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4614746Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4615530Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4616199Z #22 429.1 ptxas info : Compile time = 2.192 ms 2025-09-07T06:31:26.4620305Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4627919Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4632816Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4633789Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4634576Z #22 429.1 ptxas info : Compile time = 1.068 ms 2025-09-07T06:31:26.4638863Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4646980Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4651259Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4652210Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4653093Z #22 429.1 ptxas info : Compile time = 0.736 ms 2025-09-07T06:31:26.4657236Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4664492Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4668450Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4669228Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4669896Z #22 429.1 ptxas info : Compile time = 0.666 ms 2025-09-07T06:31:26.4673865Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4682709Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4687681Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4688518Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4689214Z #22 429.1 ptxas info : Compile time = 0.642 ms 2025-09-07T06:31:26.4694301Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4701829Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4706068Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4706918Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4707676Z #22 429.1 ptxas info : Compile time = 0.594 ms 2025-09-07T06:31:26.4712132Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4720557Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4725203Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4726008Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4726636Z #22 429.1 ptxas info : Compile time = 0.610 ms 2025-09-07T06:31:26.4730786Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4739368Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4743893Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4744888Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4746704Z #22 429.1 ptxas info : Compile time = 0.576 ms 2025-09-07T06:31:26.4750739Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4757048Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4761238Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4762111Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4762845Z #22 429.1 ptxas info : Compile time = 0.708 ms 2025-09-07T06:31:26.4767261Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4774997Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4779690Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4780560Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4781271Z #22 429.1 ptxas info : Compile time = 0.617 ms 2025-09-07T06:31:26.4786550Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4794952Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4799676Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4800623Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4801442Z #22 429.1 ptxas info : Compile time = 0.644 ms 2025-09-07T06:31:26.4805153Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4812311Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4815978Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4816759Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4817449Z #22 429.1 ptxas info : Compile time = 0.675 ms 2025-09-07T06:31:26.4821605Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4829492Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4833761Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4834644Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4835346Z #22 429.1 ptxas info : Compile time = 0.615 ms 2025-09-07T06:31:26.4839507Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4847138Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4851377Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4852226Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4853130Z #22 429.1 ptxas info : Compile time = 0.651 ms 2025-09-07T06:31:26.4857437Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4865068Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4869731Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4870623Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4871313Z #22 429.1 ptxas info : Compile time = 0.607 ms 2025-09-07T06:31:26.4875569Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4883303Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4887577Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4888435Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4889392Z #22 429.1 ptxas info : Compile time = 0.611 ms 2025-09-07T06:31:26.4894073Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4901732Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4906051Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4906878Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4907648Z #22 429.1 ptxas info : Compile time = 0.641 ms 2025-09-07T06:31:26.4912126Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4920128Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4923865Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4924799Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4925393Z #22 429.1 ptxas info : Compile time = 0.622 ms 2025-09-07T06:31:26.4928929Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4935675Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4939222Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4939961Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4940572Z #22 429.1 ptxas info : Compile time = 0.636 ms 2025-09-07T06:31:26.4944353Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4950693Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4954530Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4955381Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4956095Z #22 429.1 ptxas info : Compile time = 0.638 ms 2025-09-07T06:31:26.4960163Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4967604Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4971733Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4972784Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4973487Z #22 429.1 ptxas info : Compile time = 0.621 ms 2025-09-07T06:31:26.4977603Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4985057Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.4989318Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.4990162Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.4990912Z #22 429.1 ptxas info : Compile time = 0.643 ms 2025-09-07T06:31:26.4993805Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.4997276Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:26.4999408Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.5000269Z #22 429.1 ptxas info : Used 48 registers, used 0 barriers 2025-09-07T06:31:26.5001012Z #22 429.1 ptxas info : Compile time = 44.035 ms 2025-09-07T06:31:26.5006703Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.5014173Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5018990Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.5019814Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.5020483Z #22 429.1 ptxas info : Compile time = 0.867 ms 2025-09-07T06:31:26.5024222Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.5030920Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:26.5034642Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.5035475Z #22 429.1 ptxas info : Used 68 registers, used 1 barriers 2025-09-07T06:31:26.5036167Z #22 429.1 ptxas info : Compile time = 31.751 ms 2025-09-07T06:31:26.5039840Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEESO_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.5046550Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEESO_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:26.5051089Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.5051931Z #22 429.1 ptxas info : Used 44 registers, used 1 barriers 2025-09-07T06:31:26.5052717Z #22 429.1 ptxas info : Compile time = 18.561 ms 2025-09-07T06:31:26.5056788Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.5064828Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5069185Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.5070051Z #22 429.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:26.5070763Z #22 429.1 ptxas info : Compile time = 1.038 ms 2025-09-07T06:31:26.5072917Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:26.5076539Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:26.5078740Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.5079651Z #22 429.1 ptxas info : Used 48 registers, used 0 barriers 2025-09-07T06:31:26.5080391Z #22 429.1 ptxas info : Compile time = 24.879 ms 2025-09-07T06:31:26.5081105Z #22 429.1 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:31:26.5085540Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5093826Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5098218Z #22 429.1 112 bytes stack frame, 104 bytes spill stores, 160 bytes spill loads 2025-09-07T06:31:26.5099626Z #22 429.1 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:26.5100709Z #22 429.1 ptxas info : Compile time = 662.835 ms 2025-09-07T06:31:26.5105062Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5113278Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5117652Z #22 429.1 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:31:26.5118892Z #22 429.1 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:26.5119971Z #22 429.1 ptxas info : Compile time = 630.324 ms 2025-09-07T06:31:26.5124629Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5132797Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5137797Z #22 429.1 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:31:26.5138887Z #22 429.1 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:26.5139829Z #22 429.1 ptxas info : Compile time = 676.670 ms 2025-09-07T06:31:26.5143685Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5151366Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5155241Z #22 429.1 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:31:26.5156315Z #22 429.1 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:26.5157274Z #22 429.1 ptxas info : Compile time = 633.873 ms 2025-09-07T06:31:26.5161030Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5168250Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5172748Z #22 429.1 80 bytes stack frame, 76 bytes spill stores, 100 bytes spill loads 2025-09-07T06:31:26.5173985Z #22 429.1 ptxas info : Used 255 registers, used 2 barriers, 80 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:26.5175064Z #22 429.1 ptxas info : Compile time = 708.554 ms 2025-09-07T06:31:26.5179266Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5187653Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5191805Z #22 429.1 72 bytes stack frame, 72 bytes spill stores, 96 bytes spill loads 2025-09-07T06:31:26.5193515Z #22 429.1 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:26.5194548Z #22 429.1 ptxas info : Compile time = 670.362 ms 2025-09-07T06:31:26.5198816Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5206723Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5210769Z #22 429.1 72 bytes stack frame, 76 bytes spill stores, 80 bytes spill loads 2025-09-07T06:31:26.5212109Z #22 429.1 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:26.5213472Z #22 429.1 ptxas info : Compile time = 715.477 ms 2025-09-07T06:31:26.5217984Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5226106Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5230411Z #22 429.1 72 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads 2025-09-07T06:31:26.5231519Z #22 429.1 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:26.5232522Z #22 429.1 ptxas info : Compile time = 648.890 ms 2025-09-07T06:31:26.5236504Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5243816Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5247918Z #22 429.1 40 bytes stack frame, 40 bytes spill stores, 72 bytes spill loads 2025-09-07T06:31:26.5249406Z #22 429.1 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:26.5250447Z #22 429.1 ptxas info : Compile time = 646.258 ms 2025-09-07T06:31:26.5255007Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5263935Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5269153Z #22 429.1 40 bytes stack frame, 40 bytes spill stores, 64 bytes spill loads 2025-09-07T06:31:26.5270665Z #22 429.1 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:26.5272005Z #22 429.1 ptxas info : Compile time = 615.205 ms 2025-09-07T06:31:26.5277478Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5286622Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5291007Z #22 429.1 40 bytes stack frame, 36 bytes spill stores, 36 bytes spill loads 2025-09-07T06:31:26.5292593Z #22 429.1 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:26.5293771Z #22 429.1 ptxas info : Compile time = 657.285 ms 2025-09-07T06:31:26.5297856Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5307134Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5313266Z #22 429.1 40 bytes stack frame, 36 bytes spill stores, 36 bytes spill loads 2025-09-07T06:31:26.5314748Z #22 429.1 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:26.5316048Z #22 429.1 ptxas info : Compile time = 610.828 ms 2025-09-07T06:31:26.5321463Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5330419Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5336062Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.5337255Z #22 429.1 ptxas info : Used 250 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:26.5338283Z #22 429.1 ptxas info : Compile time = 542.879 ms 2025-09-07T06:31:26.5343038Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5352364Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5357833Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.5358726Z #22 429.1 ptxas info : Used 240 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:26.5359484Z #22 429.1 ptxas info : Compile time = 520.318 ms 2025-09-07T06:31:26.5363811Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5371404Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5375949Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.5376960Z #22 429.1 ptxas info : Used 246 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:26.5377793Z #22 429.1 ptxas info : Compile time = 559.759 ms 2025-09-07T06:31:26.5382201Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5390048Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5394622Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.5395543Z #22 429.1 ptxas info : Used 246 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:26.5396345Z #22 429.1 ptxas info : Compile time = 521.194 ms 2025-09-07T06:31:26.5400612Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5409183Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5413684Z #22 429.1 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:31:26.5414932Z #22 429.1 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:26.5416033Z #22 429.1 ptxas info : Compile time = 678.765 ms 2025-09-07T06:31:26.5420797Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5430885Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5436520Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.5437728Z #22 429.1 ptxas info : Used 253 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:26.5438798Z #22 429.1 ptxas info : Compile time = 573.399 ms 2025-09-07T06:31:26.5445025Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5455221Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5459647Z #22 429.1 24 bytes stack frame, 28 bytes spill stores, 32 bytes spill loads 2025-09-07T06:31:26.5460901Z #22 429.1 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:26.5462128Z #22 429.1 ptxas info : Compile time = 690.653 ms 2025-09-07T06:31:26.5466901Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5476708Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5481815Z #22 429.1 24 bytes stack frame, 28 bytes spill stores, 28 bytes spill loads 2025-09-07T06:31:26.5483292Z #22 429.1 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:26.5484452Z #22 429.1 ptxas info : Compile time = 640.033 ms 2025-09-07T06:31:26.5489015Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5498898Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5505492Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.5506447Z #22 429.1 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:26.5507667Z #22 429.1 ptxas info : Compile time = 596.426 ms 2025-09-07T06:31:26.5512922Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5521271Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5525328Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.5526282Z #22 429.1 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:26.5527075Z #22 429.1 ptxas info : Compile time = 528.018 ms 2025-09-07T06:31:26.5529421Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5533102Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:26.5535318Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.5536309Z #22 429.1 ptxas info : Used 48 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:26.5537162Z #22 429.1 ptxas info : Compile time = 18.892 ms 2025-09-07T06:31:26.5541644Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5549370Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.5553523Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.5554448Z #22 429.1 ptxas info : Used 253 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:26.5555258Z #22 429.1 ptxas info : Compile time = 579.492 ms 2025-09-07T06:31:26.5559116Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5566188Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:26.5570228Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.5571158Z #22 429.1 ptxas info : Used 71 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:26.5571967Z #22 429.1 ptxas info : Compile time = 24.545 ms 2025-09-07T06:31:26.5575753Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEESO_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.5582080Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEESO_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:26.5585884Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.5587077Z #22 429.1 ptxas info : Used 47 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:26.5587887Z #22 429.1 ptxas info : Compile time = 16.001 ms 2025-09-07T06:31:26.6072993Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.6081153Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEENS7_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:26.6085380Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.6086193Z #22 429.1 ptxas info : Used 253 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:26.6086914Z #22 429.1 ptxas info : Compile time = 538.410 ms 2025-09-07T06:31:26.6088700Z #22 429.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:26.6091607Z #22 429.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi96EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:26.6093908Z #22 429.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:26.6094757Z #22 429.1 ptxas info : Used 51 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:26.6095468Z #22 429.1 ptxas info : Compile time = 21.358 ms 2025-09-07T06:31:32.0976297Z #22 434.8 [35/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_bf16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_bf16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim256_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:32.0993985Z #22 434.8 ptxas info : 10 bytes gmem 2025-09-07T06:31:32.0998610Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1007091Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1011742Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1012793Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1013531Z #22 434.8 ptxas info : Compile time = 2.162 ms 2025-09-07T06:31:32.1018228Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1026405Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1030604Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1031765Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1032478Z #22 434.8 ptxas info : Compile time = 1.164 ms 2025-09-07T06:31:32.1036748Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1044769Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1049070Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1049937Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1050640Z #22 434.8 ptxas info : Compile time = 0.824 ms 2025-09-07T06:31:32.1055261Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1062974Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1067338Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1068210Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1068909Z #22 434.8 ptxas info : Compile time = 0.682 ms 2025-09-07T06:31:32.1074297Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1083007Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1087466Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1088290Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1088962Z #22 434.8 ptxas info : Compile time = 0.670 ms 2025-09-07T06:31:32.1093533Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1101393Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1105755Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1106616Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1107297Z #22 434.8 ptxas info : Compile time = 0.662 ms 2025-09-07T06:31:32.1111320Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1122724Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1126960Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1127775Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1128461Z #22 434.8 ptxas info : Compile time = 0.713 ms 2025-09-07T06:31:32.1132674Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1140282Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1144506Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1145339Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1146019Z #22 434.8 ptxas info : Compile time = 0.630 ms 2025-09-07T06:31:32.1150026Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1157337Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1161642Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1162492Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1163315Z #22 434.8 ptxas info : Compile time = 0.683 ms 2025-09-07T06:31:32.1167394Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1175031Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1179086Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1179923Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1180612Z #22 434.8 ptxas info : Compile time = 0.621 ms 2025-09-07T06:31:32.1182959Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1186481Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:32.1188697Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1189546Z #22 434.8 ptxas info : Used 60 registers, used 0 barriers 2025-09-07T06:31:32.1190254Z #22 434.8 ptxas info : Compile time = 30.368 ms 2025-09-07T06:31:32.1194771Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1202653Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1207521Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1208538Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1209355Z #22 434.8 ptxas info : Compile time = 0.995 ms 2025-09-07T06:31:32.1213865Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi8EEESH_EEENS4_IJNS5_ILi0EEESH_SK_EEEEENS4_IJNS5_ILi16EEENS5_ILi128EEESN_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1221271Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi8EEESH_EEENS4_IJNS5_ILi0EEESH_SK_EEEEENS4_IJNS5_ILi16EEENS5_ILi128EEESN_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:32.1225327Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1226148Z #22 434.8 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:31:32.1226839Z #22 434.8 ptxas info : Compile time = 28.701 ms 2025-09-07T06:31:32.1230922Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1237916Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1241667Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1242468Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1243433Z #22 434.8 ptxas info : Compile time = 0.984 ms 2025-09-07T06:31:32.1245362Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1248507Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:32.1250364Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1251244Z #22 434.8 ptxas info : Used 61 registers, used 0 barriers 2025-09-07T06:31:32.1251863Z #22 434.8 ptxas info : Compile time = 34.199 ms 2025-09-07T06:31:32.1255691Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1262264Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1266469Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1267319Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1268042Z #22 434.8 ptxas info : Compile time = 1.080 ms 2025-09-07T06:31:32.1272251Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1280280Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1284789Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1285697Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1286444Z #22 434.8 ptxas info : Compile time = 0.870 ms 2025-09-07T06:31:32.1290818Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1301080Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1305492Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1306379Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1307111Z #22 434.8 ptxas info : Compile time = 0.788 ms 2025-09-07T06:31:32.1311531Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1331435Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1335985Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1336886Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1337612Z #22 434.8 ptxas info : Compile time = 0.757 ms 2025-09-07T06:31:32.1341940Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1349900Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1354524Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1355420Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1356290Z #22 434.8 ptxas info : Compile time = 0.759 ms 2025-09-07T06:31:32.1360712Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1368804Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1373332Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1374234Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1374961Z #22 434.8 ptxas info : Compile time = 0.682 ms 2025-09-07T06:31:32.1379527Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1387635Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1392776Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1393675Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1394415Z #22 434.8 ptxas info : Compile time = 0.675 ms 2025-09-07T06:31:32.1398749Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1407494Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1411875Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1412878Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1413624Z #22 434.8 ptxas info : Compile time = 0.658 ms 2025-09-07T06:31:32.1417889Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1425893Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1430075Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1430967Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1431714Z #22 434.8 ptxas info : Compile time = 0.759 ms 2025-09-07T06:31:32.1435917Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1443879Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1448128Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1449006Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1449758Z #22 434.8 ptxas info : Compile time = 0.619 ms 2025-09-07T06:31:32.1452022Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1455978Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:32.1458272Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1459170Z #22 434.8 ptxas info : Used 124 registers, used 0 barriers 2025-09-07T06:31:32.1459949Z #22 434.8 ptxas info : Compile time = 120.179 ms 2025-09-07T06:31:32.1464392Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1472496Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1476809Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1477686Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1478550Z #22 434.8 ptxas info : Compile time = 1.150 ms 2025-09-07T06:31:32.1482582Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1490039Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:32.1494526Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1495420Z #22 434.8 ptxas info : Used 86 registers, used 1 barriers 2025-09-07T06:31:32.1496162Z #22 434.8 ptxas info : Compile time = 102.868 ms 2025-09-07T06:31:32.1500499Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1508677Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1512973Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1513863Z #22 434.8 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.1514594Z #22 434.8 ptxas info : Compile time = 0.980 ms 2025-09-07T06:31:32.1516858Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.1520543Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:32.1522819Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1523719Z #22 434.8 ptxas info : Used 122 registers, used 0 barriers 2025-09-07T06:31:32.1524474Z #22 434.8 ptxas info : Compile time = 108.392 ms 2025-09-07T06:31:32.1525183Z #22 434.8 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:31:32.1529589Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1538162Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1543071Z #22 434.8 184 bytes stack frame, 188 bytes spill stores, 232 bytes spill loads 2025-09-07T06:31:32.1545000Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 184 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1546283Z #22 434.8 ptxas info : Compile time = 548.692 ms 2025-09-07T06:31:32.1550901Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1560080Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1564615Z #22 434.8 176 bytes stack frame, 184 bytes spill stores, 224 bytes spill loads 2025-09-07T06:31:32.1565892Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 176 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1567003Z #22 434.8 ptxas info : Compile time = 508.167 ms 2025-09-07T06:31:32.1571726Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1580154Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1584767Z #22 434.8 176 bytes stack frame, 188 bytes spill stores, 200 bytes spill loads 2025-09-07T06:31:32.1586014Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 176 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1587149Z #22 434.8 ptxas info : Compile time = 615.794 ms 2025-09-07T06:31:32.1591689Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1600205Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1604824Z #22 434.8 176 bytes stack frame, 188 bytes spill stores, 196 bytes spill loads 2025-09-07T06:31:32.1606148Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 176 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1607504Z #22 434.8 ptxas info : Compile time = 511.996 ms 2025-09-07T06:31:32.1612133Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1620725Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1625360Z #22 434.8 208 bytes stack frame, 220 bytes spill stores, 272 bytes spill loads 2025-09-07T06:31:32.1626649Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 208 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1627786Z #22 434.8 ptxas info : Compile time = 648.635 ms 2025-09-07T06:31:32.1632596Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1640996Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1645577Z #22 434.8 208 bytes stack frame, 220 bytes spill stores, 264 bytes spill loads 2025-09-07T06:31:32.1646848Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 208 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1647918Z #22 434.8 ptxas info : Compile time = 564.519 ms 2025-09-07T06:31:32.1652324Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1660528Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1665081Z #22 434.8 216 bytes stack frame, 216 bytes spill stores, 232 bytes spill loads 2025-09-07T06:31:32.1666330Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 216 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1667423Z #22 434.8 ptxas info : Compile time = 664.115 ms 2025-09-07T06:31:32.1672024Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1680340Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1684876Z #22 434.8 216 bytes stack frame, 216 bytes spill stores, 228 bytes spill loads 2025-09-07T06:31:32.1686138Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 216 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1687236Z #22 434.8 ptxas info : Compile time = 565.938 ms 2025-09-07T06:31:32.1692187Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1700740Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1705573Z #22 434.8 216 bytes stack frame, 224 bytes spill stores, 264 bytes spill loads 2025-09-07T06:31:32.1706953Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 216 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1708176Z #22 434.8 ptxas info : Compile time = 611.427 ms 2025-09-07T06:31:32.1712879Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1721623Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1726159Z #22 434.8 216 bytes stack frame, 224 bytes spill stores, 256 bytes spill loads 2025-09-07T06:31:32.1727359Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 216 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1728404Z #22 434.8 ptxas info : Compile time = 523.734 ms 2025-09-07T06:31:32.1731285Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1735058Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:32.1737542Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1738481Z #22 434.8 ptxas info : Used 61 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:32.1739299Z #22 434.8 ptxas info : Compile time = 27.653 ms 2025-09-07T06:31:32.1744735Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1753419Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1757812Z #22 434.8 208 bytes stack frame, 208 bytes spill stores, 220 bytes spill loads 2025-09-07T06:31:32.1759082Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 208 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1760174Z #22 434.8 ptxas info : Compile time = 628.211 ms 2025-09-07T06:31:32.1764319Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi8EEESH_EEENS4_IJNS5_ILi0EEESH_SK_EEEEENS4_IJNS5_ILi16EEENS5_ILi128EEESN_EEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1771352Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi1EEENS5_ILi8EEESH_EEENS4_IJNS5_ILi0EEESH_SK_EEEEENS4_IJNS5_ILi16EEENS5_ILi128EEESN_EEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:32.1775402Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1776364Z #22 434.8 ptxas info : Used 51 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:32.1777187Z #22 434.8 ptxas info : Compile time = 20.976 ms 2025-09-07T06:31:32.1781541Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1789850Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi32EEENS7_ILi64EEENS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi1ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1794947Z #22 434.8 200 bytes stack frame, 200 bytes spill stores, 208 bytes spill loads 2025-09-07T06:31:32.1796127Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 200 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1797121Z #22 434.8 ptxas info : Compile time = 524.520 ms 2025-09-07T06:31:32.1799150Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1802785Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi32EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:32.1804929Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1805801Z #22 434.8 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:32.1806548Z #22 434.8 ptxas info : Compile time = 29.433 ms 2025-09-07T06:31:32.1810086Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1817072Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1820796Z #22 434.8 24 bytes stack frame, 20 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:32.1822172Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1823092Z #22 434.8 ptxas info : Compile time = 758.421 ms 2025-09-07T06:31:32.1827131Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1833887Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1839007Z #22 434.8 56 bytes stack frame, 52 bytes spill stores, 76 bytes spill loads 2025-09-07T06:31:32.1840255Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1841399Z #22 434.8 ptxas info : Compile time = 687.897 ms 2025-09-07T06:31:32.1846000Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1854697Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1859754Z #22 434.8 56 bytes stack frame, 56 bytes spill stores, 60 bytes spill loads 2025-09-07T06:31:32.1861104Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1862301Z #22 434.8 ptxas info : Compile time = 811.875 ms 2025-09-07T06:31:32.1867497Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1876052Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1880096Z #22 434.8 56 bytes stack frame, 56 bytes spill stores, 56 bytes spill loads 2025-09-07T06:31:32.1881257Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1882268Z #22 434.8 ptxas info : Compile time = 715.201 ms 2025-09-07T06:31:32.1886572Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1893998Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1897953Z #22 434.8 56 bytes stack frame, 56 bytes spill stores, 60 bytes spill loads 2025-09-07T06:31:32.1899073Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1900046Z #22 434.8 ptxas info : Compile time = 875.396 ms 2025-09-07T06:31:32.1903948Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1911001Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1914921Z #22 434.8 56 bytes stack frame, 56 bytes spill stores, 60 bytes spill loads 2025-09-07T06:31:32.1916048Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1917017Z #22 434.8 ptxas info : Compile time = 789.290 ms 2025-09-07T06:31:32.1921052Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1928195Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1932102Z #22 434.8 64 bytes stack frame, 68 bytes spill stores, 72 bytes spill loads 2025-09-07T06:31:32.1933364Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1934316Z #22 434.8 ptxas info : Compile time = 912.680 ms 2025-09-07T06:31:32.1938133Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1945247Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1949023Z #22 434.8 64 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:31:32.1950120Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1951091Z #22 434.8 ptxas info : Compile time = 769.943 ms 2025-09-07T06:31:32.1954756Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1961485Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1965308Z #22 434.8 72 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:31:32.1966421Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1967374Z #22 434.8 ptxas info : Compile time = 789.710 ms 2025-09-07T06:31:32.1971110Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1977978Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.1981734Z #22 434.8 64 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads 2025-09-07T06:31:32.1982812Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.1983730Z #22 434.8 ptxas info : Compile time = 691.775 ms 2025-09-07T06:31:32.1985680Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.1988951Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:32.1990981Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.1992181Z #22 434.8 ptxas info : Used 119 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:32.1992960Z #22 434.8 ptxas info : Compile time = 54.220 ms 2025-09-07T06:31:32.1997040Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.2003947Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.2007736Z #22 434.8 56 bytes stack frame, 60 bytes spill stores, 64 bytes spill loads 2025-09-07T06:31:32.2008817Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.2009785Z #22 434.8 ptxas info : Compile time = 803.088 ms 2025-09-07T06:31:32.2013524Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.2019802Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:32.2023320Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.2024230Z #22 434.8 ptxas info : Used 86 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:32.2024971Z #22 434.8 ptxas info : Compile time = 36.159 ms 2025-09-07T06:31:32.2028756Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.2035939Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.2039746Z #22 434.8 56 bytes stack frame, 60 bytes spill stores, 60 bytes spill loads 2025-09-07T06:31:32.2040838Z #22 434.8 ptxas info : Used 255 registers, used 2 barriers, 56 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.2041815Z #22 434.8 ptxas info : Compile time = 711.291 ms 2025-09-07T06:31:32.2043828Z #22 434.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.2464425Z #22 434.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi256EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:32.2466734Z #22 434.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.2468057Z #22 434.8 ptxas info : Used 121 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:32.2469003Z #22 434.8 ptxas info : Compile time = 64.810 ms 2025-09-07T06:31:32.4619000Z #22 435.1 [36/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_fp16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_fp16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim64_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:32.6196658Z #22 435.1 ptxas info : 10 bytes gmem 2025-09-07T06:31:32.6200973Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6211065Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6215399Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6358229Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6359060Z #22 435.1 ptxas info : Compile time = 2.269 ms 2025-09-07T06:31:32.6363477Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6372565Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6377259Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6378121Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6378863Z #22 435.1 ptxas info : Compile time = 1.057 ms 2025-09-07T06:31:32.6383163Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6391121Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6397544Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6398476Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6399238Z #22 435.1 ptxas info : Compile time = 0.712 ms 2025-09-07T06:31:32.6403663Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6414778Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6419412Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6420448Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6421214Z #22 435.1 ptxas info : Compile time = 0.668 ms 2025-09-07T06:31:32.6425714Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6433358Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6437685Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6438545Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6439323Z #22 435.1 ptxas info : Compile time = 0.603 ms 2025-09-07T06:31:32.6443893Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6451715Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6456235Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6457174Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6457887Z #22 435.1 ptxas info : Compile time = 0.630 ms 2025-09-07T06:31:32.6462227Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6470014Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6474554Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6475402Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6476112Z #22 435.1 ptxas info : Compile time = 0.702 ms 2025-09-07T06:31:32.6480513Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6488351Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6492999Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6493896Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6494628Z #22 435.1 ptxas info : Compile time = 0.608 ms 2025-09-07T06:31:32.6498696Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6506285Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6510417Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6511267Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6511988Z #22 435.1 ptxas info : Compile time = 0.618 ms 2025-09-07T06:31:32.6516089Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6523447Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6527497Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6528361Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6529087Z #22 435.1 ptxas info : Compile time = 0.731 ms 2025-09-07T06:31:32.6531146Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6534679Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:32.6536776Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6538052Z #22 435.1 ptxas info : Used 39 registers, used 0 barriers 2025-09-07T06:31:32.6538934Z #22 435.1 ptxas info : Compile time = 44.110 ms 2025-09-07T06:31:32.6542920Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6550208Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6554083Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6554899Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6555538Z #22 435.1 ptxas info : Compile time = 0.983 ms 2025-09-07T06:31:32.6559329Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6565738Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:32.6569134Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6569946Z #22 435.1 ptxas info : Used 40 registers, used 1 barriers 2025-09-07T06:31:32.6570656Z #22 435.1 ptxas info : Compile time = 40.369 ms 2025-09-07T06:31:32.6574970Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6583222Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6587939Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6588872Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6589693Z #22 435.1 ptxas info : Compile time = 0.970 ms 2025-09-07T06:31:32.6592247Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6596114Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:32.6598718Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6599660Z #22 435.1 ptxas info : Used 43 registers, used 0 barriers 2025-09-07T06:31:32.6600466Z #22 435.1 ptxas info : Compile time = 26.116 ms 2025-09-07T06:31:32.6605462Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6614642Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6619364Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6620327Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6621148Z #22 435.1 ptxas info : Compile time = 0.939 ms 2025-09-07T06:31:32.6625293Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6631904Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6635649Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6636428Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6637078Z #22 435.1 ptxas info : Compile time = 0.806 ms 2025-09-07T06:31:32.6640808Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6647659Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6651496Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6652284Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6653143Z #22 435.1 ptxas info : Compile time = 0.670 ms 2025-09-07T06:31:32.6656819Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6663574Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6667451Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6668268Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6668909Z #22 435.1 ptxas info : Compile time = 0.626 ms 2025-09-07T06:31:32.6672611Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6679568Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6683336Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6684104Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6684769Z #22 435.1 ptxas info : Compile time = 0.646 ms 2025-09-07T06:31:32.6688543Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6695836Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6699571Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6700369Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6701005Z #22 435.1 ptxas info : Compile time = 0.596 ms 2025-09-07T06:31:32.6704755Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6711554Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6715546Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6716345Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6716979Z #22 435.1 ptxas info : Compile time = 0.601 ms 2025-09-07T06:31:32.6720875Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6727626Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6731462Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6732261Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6733105Z #22 435.1 ptxas info : Compile time = 0.638 ms 2025-09-07T06:31:32.6737023Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6746083Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6751756Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6752807Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6753641Z #22 435.1 ptxas info : Compile time = 0.579 ms 2025-09-07T06:31:32.6758467Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6767115Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6771760Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6772930Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6773773Z #22 435.1 ptxas info : Compile time = 0.577 ms 2025-09-07T06:31:32.6776329Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6780611Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:32.6783140Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6784241Z #22 435.1 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:31:32.6785090Z #22 435.1 ptxas info : Compile time = 34.079 ms 2025-09-07T06:31:32.6790006Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6799106Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6804110Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6805059Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6805815Z #22 435.1 ptxas info : Compile time = 0.932 ms 2025-09-07T06:31:32.6810444Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJS7_NS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6818433Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJS7_NS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:32.6822974Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6824012Z #22 435.1 ptxas info : Used 56 registers, used 1 barriers 2025-09-07T06:31:32.6824831Z #22 435.1 ptxas info : Compile time = 27.773 ms 2025-09-07T06:31:32.6829703Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6838629Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6842478Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6843280Z #22 435.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:32.6843936Z #22 435.1 ptxas info : Compile time = 0.966 ms 2025-09-07T06:31:32.6845874Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:32.6849357Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:32.6851522Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6852286Z #22 435.1 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:31:32.6853151Z #22 435.1 ptxas info : Compile time = 40.119 ms 2025-09-07T06:31:32.6853789Z #22 435.1 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:31:32.6857577Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.6864515Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6868453Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6869599Z #22 435.1 ptxas info : Used 249 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:32.6870389Z #22 435.1 ptxas info : Compile time = 445.391 ms 2025-09-07T06:31:32.6874859Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.6881703Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6886139Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6887022Z #22 435.1 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:32.6887798Z #22 435.1 ptxas info : Compile time = 420.515 ms 2025-09-07T06:31:32.6891602Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.6898983Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6903105Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6904006Z #22 435.1 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:32.6904781Z #22 435.1 ptxas info : Compile time = 470.194 ms 2025-09-07T06:31:32.6908698Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.6915848Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6919709Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6920605Z #22 435.1 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:32.6921419Z #22 435.1 ptxas info : Compile time = 421.520 ms 2025-09-07T06:31:32.6925623Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.6932755Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6936632Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6937514Z #22 435.1 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:32.6938286Z #22 435.1 ptxas info : Compile time = 493.521 ms 2025-09-07T06:31:32.6942161Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.6949220Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6953181Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6954094Z #22 435.1 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:32.6954920Z #22 435.1 ptxas info : Compile time = 464.232 ms 2025-09-07T06:31:32.6958875Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.6965944Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6969954Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6970848Z #22 435.1 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:32.6971593Z #22 435.1 ptxas info : Compile time = 509.590 ms 2025-09-07T06:31:32.6975557Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.6983426Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.6988209Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.6989363Z #22 435.1 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:32.6990272Z #22 435.1 ptxas info : Compile time = 465.179 ms 2025-09-07T06:31:32.6995371Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7003912Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.7008660Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.7009780Z #22 435.1 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:32.7010745Z #22 435.1 ptxas info : Compile time = 457.776 ms 2025-09-07T06:31:32.7015615Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7024116Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.7029098Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.7030247Z #22 435.1 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:32.7031345Z #22 435.1 ptxas info : Compile time = 427.864 ms 2025-09-07T06:31:32.7033768Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7037650Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:32.7040121Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.7041168Z #22 435.1 ptxas info : Used 39 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:32.7042139Z #22 435.1 ptxas info : Compile time = 19.381 ms 2025-09-07T06:31:32.7047155Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7056682Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.7061586Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.7062756Z #22 435.1 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:32.7063755Z #22 435.1 ptxas info : Compile time = 481.230 ms 2025-09-07T06:31:32.7067744Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7074654Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:32.7078595Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.7079637Z #22 435.1 ptxas info : Used 40 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:32.7080556Z #22 435.1 ptxas info : Compile time = 13.390 ms 2025-09-07T06:31:32.7084914Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7093152Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES8_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi4ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.7097960Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.7099034Z #22 435.1 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:32.7099935Z #22 435.1 ptxas info : Compile time = 429.050 ms 2025-09-07T06:31:32.7102156Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7105700Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEES6_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:32.7107899Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.7108965Z #22 435.1 ptxas info : Used 45 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:32.7109886Z #22 435.1 ptxas info : Compile time = 20.862 ms 2025-09-07T06:31:32.7114534Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7122483Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.7126947Z #22 435.1 96 bytes stack frame, 92 bytes spill stores, 148 bytes spill loads 2025-09-07T06:31:32.7128140Z #22 435.1 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.7129235Z #22 435.1 ptxas info : Compile time = 803.224 ms 2025-09-07T06:31:32.7134163Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7142347Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.7145959Z #22 435.1 64 bytes stack frame, 56 bytes spill stores, 72 bytes spill loads 2025-09-07T06:31:32.7147007Z #22 435.1 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.7147934Z #22 435.1 ptxas info : Compile time = 781.794 ms 2025-09-07T06:31:32.7151640Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7159696Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.7164766Z #22 435.1 72 bytes stack frame, 68 bytes spill stores, 72 bytes spill loads 2025-09-07T06:31:32.7166177Z #22 435.1 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.7167422Z #22 435.1 ptxas info : Compile time = 833.308 ms 2025-09-07T06:31:32.7171838Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7180290Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.7185010Z #22 435.1 64 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:31:32.7186329Z #22 435.1 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.7187496Z #22 435.1 ptxas info : Compile time = 785.198 ms 2025-09-07T06:31:32.7192194Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7199559Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.7203970Z #22 435.1 88 bytes stack frame, 92 bytes spill stores, 124 bytes spill loads 2025-09-07T06:31:32.7205088Z #22 435.1 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.7206165Z #22 435.1 ptxas info : Compile time = 919.507 ms 2025-09-07T06:31:32.7210275Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7216625Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.7220778Z #22 435.1 88 bytes stack frame, 84 bytes spill stores, 112 bytes spill loads 2025-09-07T06:31:32.7222151Z #22 435.1 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.7223410Z #22 435.1 ptxas info : Compile time = 834.109 ms 2025-09-07T06:31:32.7228680Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7238533Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.7244159Z #22 435.1 104 bytes stack frame, 104 bytes spill stores, 120 bytes spill loads 2025-09-07T06:31:32.7246057Z #22 435.1 ptxas info : Used 255 registers, used 2 barriers, 104 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.7247380Z #22 435.1 ptxas info : Compile time = 920.343 ms 2025-09-07T06:31:32.7252796Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7262522Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.7267141Z #22 435.1 104 bytes stack frame, 104 bytes spill stores, 116 bytes spill loads 2025-09-07T06:31:32.7268269Z #22 435.1 ptxas info : Used 255 registers, used 2 barriers, 104 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.7269296Z #22 435.1 ptxas info : Compile time = 854.166 ms 2025-09-07T06:31:32.7273000Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7279802Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi2EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.7283679Z #22 435.1 88 bytes stack frame, 88 bytes spill stores, 120 bytes spill loads 2025-09-07T06:31:32.7285002Z #22 435.1 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.7285974Z #22 435.1 ptxas info : Compile time = 846.103 ms 2025-09-07T06:31:32.7289531Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7296150Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.7300359Z #22 435.1 96 bytes stack frame, 96 bytes spill stores, 120 bytes spill loads 2025-09-07T06:31:32.7301686Z #22 435.1 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.7302822Z #22 435.1 ptxas info : Compile time = 753.396 ms 2025-09-07T06:31:32.7305131Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7310925Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:32.7313313Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.7314365Z #22 435.1 ptxas info : Used 62 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:32.7315221Z #22 435.1 ptxas info : Compile time = 30.012 ms 2025-09-07T06:31:32.7319682Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7328080Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi2EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.7333035Z #22 435.1 96 bytes stack frame, 96 bytes spill stores, 100 bytes spill loads 2025-09-07T06:31:32.7334367Z #22 435.1 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.7335548Z #22 435.1 ptxas info : Compile time = 793.051 ms 2025-09-07T06:31:32.7339810Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJS7_NS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7347617Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJS7_NS5_ILi32EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:32.7352148Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.7353351Z #22 435.1 ptxas info : Used 56 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:32.7354220Z #22 435.1 ptxas info : Compile time = 19.675 ms 2025-09-07T06:31:32.7358725Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7365341Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi128EEES8_NS7_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi4ELi4ELi4ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:32.7369029Z #22 435.1 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads 2025-09-07T06:31:32.7370111Z #22 435.1 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:32.7371272Z #22 435.1 ptxas info : Compile time = 744.595 ms 2025-09-07T06:31:32.7373998Z #22 435.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:32.7377067Z #22 435.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi128EEENS5_ILi64EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:32.7379003Z #22 435.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:32.7379884Z #22 435.1 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:32.7380641Z #22 435.1 ptxas info : Compile time = 32.889 ms 2025-09-07T06:31:40.9327708Z #22 443.6 [37/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_fp16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_fp16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:41.0934439Z #22 443.6 ptxas info : 10 bytes gmem 2025-09-07T06:31:41.0939064Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.0947123Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.0951642Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.0952649Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.0953496Z #22 443.6 ptxas info : Compile time = 2.036 ms 2025-09-07T06:31:41.0958271Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.0966257Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.0970796Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.0971832Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.0972852Z #22 443.6 ptxas info : Compile time = 0.969 ms 2025-09-07T06:31:41.0977171Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.0985062Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.0989585Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.0990558Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.0991393Z #22 443.6 ptxas info : Compile time = 0.626 ms 2025-09-07T06:31:41.0996122Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1003984Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1008418Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1009403Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1010224Z #22 443.6 ptxas info : Compile time = 0.559 ms 2025-09-07T06:31:41.1014639Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1022851Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1027277Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1028265Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1029098Z #22 443.6 ptxas info : Compile time = 0.572 ms 2025-09-07T06:31:41.1033266Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1040770Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1044984Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1045986Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1046817Z #22 443.6 ptxas info : Compile time = 0.535 ms 2025-09-07T06:31:41.1051153Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1059274Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1064173Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1065195Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1066025Z #22 443.6 ptxas info : Compile time = 0.550 ms 2025-09-07T06:31:41.1070381Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1078221Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1082771Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1083756Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1084806Z #22 443.6 ptxas info : Compile time = 0.584 ms 2025-09-07T06:31:41.1089015Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1207747Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1211974Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1213041Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1213749Z #22 443.6 ptxas info : Compile time = 0.547 ms 2025-09-07T06:31:41.1217722Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1225101Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1229243Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1230143Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1230846Z #22 443.6 ptxas info : Compile time = 0.625 ms 2025-09-07T06:31:41.1235408Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1243114Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1247391Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1248266Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1248986Z #22 443.6 ptxas info : Compile time = 0.561 ms 2025-09-07T06:31:41.1253462Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1261442Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1265686Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1266573Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1267289Z #22 443.6 ptxas info : Compile time = 0.553 ms 2025-09-07T06:31:41.1271607Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1279362Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1283398Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1284247Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1284958Z #22 443.6 ptxas info : Compile time = 0.625 ms 2025-09-07T06:31:41.1289133Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1297565Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1302069Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1302914Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1303615Z #22 443.6 ptxas info : Compile time = 0.549 ms 2025-09-07T06:31:41.1307927Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1316075Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1320905Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1321850Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1322922Z #22 443.6 ptxas info : Compile time = 0.532 ms 2025-09-07T06:31:41.1327638Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1336286Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1341078Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1341925Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1342592Z #22 443.6 ptxas info : Compile time = 0.540 ms 2025-09-07T06:31:41.1346606Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1354150Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1358351Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1359389Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1360085Z #22 443.6 ptxas info : Compile time = 0.538 ms 2025-09-07T06:31:41.1364293Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1372120Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1376555Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1377391Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1378055Z #22 443.6 ptxas info : Compile time = 0.551 ms 2025-09-07T06:31:41.1382485Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1390319Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1397453Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1398663Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1399631Z #22 443.6 ptxas info : Compile time = 0.552 ms 2025-09-07T06:31:41.1405737Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1416523Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1422645Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1423710Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1424550Z #22 443.6 ptxas info : Compile time = 0.542 ms 2025-09-07T06:31:41.1429640Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1440304Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1445630Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1446701Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1447597Z #22 443.6 ptxas info : Compile time = 0.538 ms 2025-09-07T06:31:41.1452819Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1459415Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1462997Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1463675Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1464244Z #22 443.6 ptxas info : Compile time = 0.603 ms 2025-09-07T06:31:41.1465906Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1468578Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:41.1470298Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1470962Z #22 443.6 ptxas info : Used 76 registers, used 0 barriers 2025-09-07T06:31:41.1471540Z #22 443.6 ptxas info : Compile time = 37.813 ms 2025-09-07T06:31:41.1474878Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1480835Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1484101Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1484803Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1485364Z #22 443.6 ptxas info : Compile time = 0.962 ms 2025-09-07T06:31:41.1488319Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi16EEESP_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1494263Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi16EEESP_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:41.1498706Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1499819Z #22 443.6 ptxas info : Used 80 registers, used 1 barriers 2025-09-07T06:31:41.1500789Z #22 443.6 ptxas info : Compile time = 74.143 ms 2025-09-07T06:31:41.1506160Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1515912Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:41.1520526Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1521576Z #22 443.6 ptxas info : Used 70 registers, used 1 barriers 2025-09-07T06:31:41.1522420Z #22 443.6 ptxas info : Compile time = 53.702 ms 2025-09-07T06:31:41.1527737Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1537607Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1542762Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1543816Z #22 443.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:41.1544662Z #22 443.6 ptxas info : Compile time = 0.982 ms 2025-09-07T06:31:41.1547336Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:41.1551646Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:41.1553321Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1554002Z #22 443.6 ptxas info : Used 74 registers, used 0 barriers 2025-09-07T06:31:41.1554830Z #22 443.6 ptxas info : Compile time = 96.938 ms 2025-09-07T06:31:41.1555383Z #22 443.6 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:31:41.1558564Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1564440Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1567615Z #22 443.6 24 bytes stack frame, 24 bytes spill stores, 24 bytes spill loads 2025-09-07T06:31:41.1568551Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:41.1569366Z #22 443.6 ptxas info : Compile time = 570.907 ms 2025-09-07T06:31:41.1573079Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1578813Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1582004Z #22 443.6 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:41.1582938Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:41.1583761Z #22 443.6 ptxas info : Compile time = 541.928 ms 2025-09-07T06:31:41.1586978Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1594085Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1598829Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1599988Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:41.1600962Z #22 443.6 ptxas info : Compile time = 618.319 ms 2025-09-07T06:31:41.1605884Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1613631Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1617886Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1618860Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:41.1619665Z #22 443.6 ptxas info : Compile time = 569.217 ms 2025-09-07T06:31:41.1623726Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1631154Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1635471Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1636399Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:41.1637211Z #22 443.6 ptxas info : Compile time = 653.641 ms 2025-09-07T06:31:41.1641258Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1648709Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1653005Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1653956Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:41.1654758Z #22 443.6 ptxas info : Compile time = 603.183 ms 2025-09-07T06:31:41.1658898Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1666335Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1670649Z #22 443.6 32 bytes stack frame, 32 bytes spill stores, 32 bytes spill loads 2025-09-07T06:31:41.1671791Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:41.1672985Z #22 443.6 ptxas info : Compile time = 758.604 ms 2025-09-07T06:31:41.1677052Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1684499Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1688785Z #22 443.6 32 bytes stack frame, 32 bytes spill stores, 32 bytes spill loads 2025-09-07T06:31:41.1690010Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:41.1691080Z #22 443.6 ptxas info : Compile time = 720.290 ms 2025-09-07T06:31:41.1695863Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1703608Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1707603Z #22 443.6 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:31:41.1708770Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:41.1709844Z #22 443.6 ptxas info : Compile time = 639.491 ms 2025-09-07T06:31:41.1713857Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1721729Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1726342Z #22 443.6 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:31:41.1727715Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:41.1728927Z #22 443.6 ptxas info : Compile time = 586.706 ms 2025-09-07T06:31:41.1734107Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1741646Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1745355Z #22 443.6 8 bytes stack frame, 8 bytes spill stores, 12 bytes spill loads 2025-09-07T06:31:41.1746464Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:41.1747406Z #22 443.6 ptxas info : Compile time = 686.415 ms 2025-09-07T06:31:41.1751055Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1757984Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1761669Z #22 443.6 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:31:41.1762684Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:41.1763581Z #22 443.6 ptxas info : Compile time = 620.418 ms 2025-09-07T06:31:41.1767373Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1775292Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1779569Z #22 443.6 24 bytes stack frame, 24 bytes spill stores, 24 bytes spill loads 2025-09-07T06:31:41.1780752Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:41.1781802Z #22 443.6 ptxas info : Compile time = 981.301 ms 2025-09-07T06:31:41.1786113Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1795020Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1801079Z #22 443.6 24 bytes stack frame, 24 bytes spill stores, 24 bytes spill loads 2025-09-07T06:31:41.1802737Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:41.1804753Z #22 443.6 ptxas info : Compile time = 784.894 ms 2025-09-07T06:31:41.1810661Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1820863Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1826557Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1827776Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:41.1828797Z #22 443.6 ptxas info : Compile time = 987.531 ms 2025-09-07T06:31:41.1834039Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1843741Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1848996Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1850206Z #22 443.6 ptxas info : Used 253 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:41.1851201Z #22 443.6 ptxas info : Compile time = 641.681 ms 2025-09-07T06:31:41.1856674Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1863637Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1867218Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1867955Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:41.1868711Z #22 443.6 ptxas info : Compile time = 958.581 ms 2025-09-07T06:31:41.1871952Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1877827Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1881172Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1882027Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:41.1882674Z #22 443.6 ptxas info : Compile time = 767.074 ms 2025-09-07T06:31:41.1886175Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1892520Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1895802Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1896548Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:41.1897223Z #22 443.6 ptxas info : Compile time = 987.807 ms 2025-09-07T06:31:41.1901845Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1912227Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1918015Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1919244Z #22 443.6 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:41.1920272Z #22 443.6 ptxas info : Compile time = 664.769 ms 2025-09-07T06:31:41.1925519Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1935313Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1940462Z #22 443.6 24 bytes stack frame, 24 bytes spill stores, 28 bytes spill loads 2025-09-07T06:31:41.1941961Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:41.1943170Z #22 443.6 ptxas info : Compile time = 919.621 ms 2025-09-07T06:31:41.1948275Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1957105Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1960296Z #22 443.6 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:41.1961200Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:41.1962036Z #22 443.6 ptxas info : Compile time = 728.899 ms 2025-09-07T06:31:41.1963684Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1966357Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:41.1968018Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1968761Z #22 443.6 ptxas info : Used 75 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:41.1969410Z #22 443.6 ptxas info : Compile time = 31.622 ms 2025-09-07T06:31:41.1972903Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1978903Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.1982336Z #22 443.6 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:41.1983278Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:41.1984181Z #22 443.6 ptxas info : Compile time = 947.511 ms 2025-09-07T06:31:41.1987136Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi16EEESP_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.1992838Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi16EEESP_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:41.1996710Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.1997702Z #22 443.6 ptxas info : Used 80 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:41.1998565Z #22 443.6 ptxas info : Compile time = 40.885 ms 2025-09-07T06:31:41.2002703Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.2009749Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:41.2013946Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.2014955Z #22 443.6 ptxas info : Used 69 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:41.2015787Z #22 443.6 ptxas info : Compile time = 24.416 ms 2025-09-07T06:31:41.2020182Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.2028225Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:41.2032663Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.2033667Z #22 443.6 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:41.2034548Z #22 443.6 ptxas info : Compile time = 640.610 ms 2025-09-07T06:31:41.2036753Z #22 443.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:41.2040570Z #22 443.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:41.2042934Z #22 443.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:41.2043931Z #22 443.6 ptxas info : Used 78 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:41.2044799Z #22 443.6 ptxas info : Compile time = 38.340 ms 2025-09-07T06:31:45.4394122Z #22 448.1 [38/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_bf16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_bf16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim192_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:45.5963364Z #22 448.1 ptxas info : 10 bytes gmem 2025-09-07T06:31:45.5968672Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.5977330Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.5981890Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.5982821Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.5983569Z #22 448.1 ptxas info : Compile time = 2.206 ms 2025-09-07T06:31:45.5987914Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.5997877Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6002740Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6003616Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6004330Z #22 448.1 ptxas info : Compile time = 1.114 ms 2025-09-07T06:31:45.6008665Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6017006Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6021354Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6022233Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6022978Z #22 448.1 ptxas info : Compile time = 0.740 ms 2025-09-07T06:31:45.6027193Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6034919Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6039217Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6040103Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6040828Z #22 448.1 ptxas info : Compile time = 31.003 ms 2025-09-07T06:31:45.6045502Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6053962Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6058540Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6059555Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6060275Z #22 448.1 ptxas info : Compile time = 0.872 ms 2025-09-07T06:31:45.6064483Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6072266Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6076550Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6077424Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6078150Z #22 448.1 ptxas info : Compile time = 0.761 ms 2025-09-07T06:31:45.6082589Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6090395Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6169996Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6170924Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6171690Z #22 448.1 ptxas info : Compile time = 0.701 ms 2025-09-07T06:31:45.6176055Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6183735Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6188032Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6188876Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6189614Z #22 448.1 ptxas info : Compile time = 0.654 ms 2025-09-07T06:31:45.6194073Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6202101Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6206343Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6207252Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6207982Z #22 448.1 ptxas info : Compile time = 0.732 ms 2025-09-07T06:31:45.6212158Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6220228Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6224480Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6225380Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6226120Z #22 448.1 ptxas info : Compile time = 0.638 ms 2025-09-07T06:31:45.6230401Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6238507Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6242908Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6243775Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6244531Z #22 448.1 ptxas info : Compile time = 0.683 ms 2025-09-07T06:31:45.6248772Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6256708Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6261238Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6262256Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6262977Z #22 448.1 ptxas info : Compile time = 0.643 ms 2025-09-07T06:31:45.6267359Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6275453Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6279875Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6280770Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6281521Z #22 448.1 ptxas info : Compile time = 0.652 ms 2025-09-07T06:31:45.6286179Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6294664Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6299248Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6300180Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6300911Z #22 448.1 ptxas info : Compile time = 0.659 ms 2025-09-07T06:31:45.6305364Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6313539Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6317217Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6317987Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6318917Z #22 448.1 ptxas info : Compile time = 0.627 ms 2025-09-07T06:31:45.6322633Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6329590Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6334041Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6334895Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6335640Z #22 448.1 ptxas info : Compile time = 0.641 ms 2025-09-07T06:31:45.6340017Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6350110Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6354655Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6355536Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6356264Z #22 448.1 ptxas info : Compile time = 0.672 ms 2025-09-07T06:31:45.6360631Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6368584Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6373186Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6374061Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6374799Z #22 448.1 ptxas info : Compile time = 0.621 ms 2025-09-07T06:31:45.6379219Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6387350Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6391846Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6393071Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6393796Z #22 448.1 ptxas info : Compile time = 0.629 ms 2025-09-07T06:31:45.6398212Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6406341Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6411181Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6412073Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6413010Z #22 448.1 ptxas info : Compile time = 0.609 ms 2025-09-07T06:31:45.6417375Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6425143Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6429521Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6430385Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6431144Z #22 448.1 ptxas info : Compile time = 0.594 ms 2025-09-07T06:31:45.6435490Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6443365Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6448020Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6448913Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6449643Z #22 448.1 ptxas info : Compile time = 0.634 ms 2025-09-07T06:31:45.6451891Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6456063Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:45.6458291Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6459207Z #22 448.1 ptxas info : Used 90 registers, used 0 barriers 2025-09-07T06:31:45.6459944Z #22 448.1 ptxas info : Compile time = 90.013 ms 2025-09-07T06:31:45.6464370Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6472769Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6477266Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6478175Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6478924Z #22 448.1 ptxas info : Compile time = 1.039 ms 2025-09-07T06:31:45.6482960Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi16EEESP_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6490967Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi16EEESP_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:45.6495695Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6496780Z #22 448.1 ptxas info : Used 80 registers, used 1 barriers 2025-09-07T06:31:45.6497908Z #22 448.1 ptxas info : Compile time = 56.254 ms 2025-09-07T06:31:45.6502116Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6509636Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:45.6514350Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6515578Z #22 448.1 ptxas info : Used 70 registers, used 1 barriers 2025-09-07T06:31:45.6516393Z #22 448.1 ptxas info : Compile time = 36.098 ms 2025-09-07T06:31:45.6521100Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6529398Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6534099Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6535347Z #22 448.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:45.6536334Z #22 448.1 ptxas info : Compile time = 1.037 ms 2025-09-07T06:31:45.6538989Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:45.6543052Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:45.6545547Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6546508Z #22 448.1 ptxas info : Used 76 registers, used 0 barriers 2025-09-07T06:31:45.6547583Z #22 448.1 ptxas info : Compile time = 45.192 ms 2025-09-07T06:31:45.6548418Z #22 448.1 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:31:45.6552978Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6561248Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6565836Z #22 448.1 24 bytes stack frame, 24 bytes spill stores, 24 bytes spill loads 2025-09-07T06:31:45.6567158Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:45.6568603Z #22 448.1 ptxas info : Compile time = 596.844 ms 2025-09-07T06:31:45.6573420Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6581679Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6586459Z #22 448.1 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:45.6587893Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:45.6589047Z #22 448.1 ptxas info : Compile time = 557.986 ms 2025-09-07T06:31:45.6594287Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6603968Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6609200Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6610426Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:45.6611565Z #22 448.1 ptxas info : Compile time = 600.645 ms 2025-09-07T06:31:45.6616610Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6625825Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6630870Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6632270Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:45.6633349Z #22 448.1 ptxas info : Compile time = 575.096 ms 2025-09-07T06:31:45.6638483Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6645562Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6649149Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6650201Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:45.6651080Z #22 448.1 ptxas info : Compile time = 691.360 ms 2025-09-07T06:31:45.6654646Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6660612Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6664189Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6665060Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:45.6665861Z #22 448.1 ptxas info : Compile time = 664.818 ms 2025-09-07T06:31:45.6669933Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6675901Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6679284Z #22 448.1 32 bytes stack frame, 32 bytes spill stores, 32 bytes spill loads 2025-09-07T06:31:45.6680510Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:45.6681440Z #22 448.1 ptxas info : Compile time = 755.513 ms 2025-09-07T06:31:45.6684801Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6690860Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6695229Z #22 448.1 32 bytes stack frame, 32 bytes spill stores, 32 bytes spill loads 2025-09-07T06:31:45.6696665Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:45.6698246Z #22 448.1 ptxas info : Compile time = 661.950 ms 2025-09-07T06:31:45.6702597Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6710601Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6715064Z #22 448.1 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:31:45.6716467Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:45.6717642Z #22 448.1 ptxas info : Compile time = 590.107 ms 2025-09-07T06:31:45.6722146Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6730293Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6735276Z #22 448.1 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:31:45.6736739Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:45.6738327Z #22 448.1 ptxas info : Compile time = 545.922 ms 2025-09-07T06:31:45.6743870Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6752147Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6757629Z #22 448.1 8 bytes stack frame, 8 bytes spill stores, 12 bytes spill loads 2025-09-07T06:31:45.6759050Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:45.6760355Z #22 448.1 ptxas info : Compile time = 612.925 ms 2025-09-07T06:31:45.6764919Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6773482Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi1ELi1EN4cute5tupleIJNS5_1CILi64EEES8_NS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi64EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6778525Z #22 448.1 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:31:45.6779912Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:45.6781199Z #22 448.1 ptxas info : Compile time = 581.693 ms 2025-09-07T06:31:45.6786051Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6794996Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6799682Z #22 448.1 24 bytes stack frame, 24 bytes spill stores, 24 bytes spill loads 2025-09-07T06:31:45.6801139Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:45.6802332Z #22 448.1 ptxas info : Compile time = 918.212 ms 2025-09-07T06:31:45.6807027Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6815818Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6820632Z #22 448.1 24 bytes stack frame, 24 bytes spill stores, 24 bytes spill loads 2025-09-07T06:31:45.6822048Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:45.6842382Z #22 448.1 ptxas info : Compile time = 738.112 ms 2025-09-07T06:31:45.6847039Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6855172Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6859922Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6861093Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:45.6861977Z #22 448.1 ptxas info : Compile time = 930.809 ms 2025-09-07T06:31:45.6866465Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6874529Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6878984Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6880003Z #22 448.1 ptxas info : Used 253 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:45.6880835Z #22 448.1 ptxas info : Compile time = 640.438 ms 2025-09-07T06:31:45.6885564Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6893700Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6898610Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6899806Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:45.6900616Z #22 448.1 ptxas info : Compile time = 945.628 ms 2025-09-07T06:31:45.6905595Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6914905Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6919966Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6921409Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:45.6922350Z #22 448.1 ptxas info : Compile time = 761.596 ms 2025-09-07T06:31:45.6927094Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6936430Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6941267Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6942362Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:45.6943274Z #22 448.1 ptxas info : Compile time = 983.399 ms 2025-09-07T06:31:45.6948672Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6957870Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6962873Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.6963916Z #22 448.1 ptxas info : Used 254 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:45.6964878Z #22 448.1 ptxas info : Compile time = 642.919 ms 2025-09-07T06:31:45.6969758Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6977175Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb1ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6980402Z #22 448.1 24 bytes stack frame, 24 bytes spill stores, 28 bytes spill loads 2025-09-07T06:31:45.6981318Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:45.6982163Z #22 448.1 ptxas info : Compile time = 908.396 ms 2025-09-07T06:31:45.6985355Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.6991295Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.6994846Z #22 448.1 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:31:45.6995786Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:45.6996589Z #22 448.1 ptxas info : Compile time = 702.154 ms 2025-09-07T06:31:45.6998294Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.7001104Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:45.7002791Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.7003555Z #22 448.1 ptxas info : Used 84 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:45.7004195Z #22 448.1 ptxas info : Compile time = 31.121 ms 2025-09-07T06:31:45.7007785Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.7013970Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb1ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.7017299Z #22 448.1 16 bytes stack frame, 16 bytes spill stores, 20 bytes spill loads 2025-09-07T06:31:45.7018239Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:45.7019082Z #22 448.1 ptxas info : Compile time = 904.620 ms 2025-09-07T06:31:45.7022084Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi16EEESP_EEEEELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.7028119Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi80EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi4EEENS5_ILi2EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi64EEENS5_ILi16EEESP_EEEEELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:45.7032599Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.7033712Z #22 448.1 ptxas info : Used 80 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:45.7034909Z #22 448.1 ptxas info : Compile time = 42.388 ms 2025-09-07T06:31:45.7039514Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.7047901Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:45.7052638Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.7053667Z #22 448.1 ptxas info : Used 69 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:45.7054470Z #22 448.1 ptxas info : Compile time = 24.543 ms 2025-09-07T06:31:45.7059346Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.7068845Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi80EEENS7_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0ELi2ELi4ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi80EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:45.7073912Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.7075025Z #22 448.1 ptxas info : Used 255 registers, used 2 barriers, 976 bytes cmem[0] 2025-09-07T06:31:45.7075966Z #22 448.1 ptxas info : Compile time = 607.394 ms 2025-09-07T06:31:45.7078532Z #22 448.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:45.7082621Z #22 448.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi192EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:45.7085173Z #22 448.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:45.7086255Z #22 448.1 ptxas info : Used 87 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:45.7087175Z #22 448.1 ptxas info : Compile time = 36.750 ms 2025-09-07T06:31:48.6017431Z #22 451.3 [39/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:48.7576124Z #22 451.3 ptxas info : 10 bytes gmem 2025-09-07T06:31:48.7580533Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:48.7589044Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7593686Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7594608Z #22 451.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:48.7595382Z #22 451.3 ptxas info : Compile time = 2.125 ms 2025-09-07T06:31:48.7600119Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:48.7607893Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7612785Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7613683Z #22 451.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:48.7614431Z #22 451.3 ptxas info : Compile time = 21.152 ms 2025-09-07T06:31:48.7618815Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:48.7627852Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7632399Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7633213Z #22 451.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:48.7633903Z #22 451.3 ptxas info : Compile time = 0.917 ms 2025-09-07T06:31:48.7637826Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:48.7646453Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7650806Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7651679Z #22 451.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:48.7652571Z #22 451.3 ptxas info : Compile time = 0.818 ms 2025-09-07T06:31:48.7657248Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:48.7665479Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7670366Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7671280Z #22 451.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:48.7672050Z #22 451.3 ptxas info : Compile time = 0.926 ms 2025-09-07T06:31:48.7676644Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:48.7685633Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7691319Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7692743Z #22 451.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:48.7693544Z #22 451.3 ptxas info : Compile time = 0.641 ms 2025-09-07T06:31:48.7698241Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:48.7707610Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7712444Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7713428Z #22 451.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:48.7714583Z #22 451.3 ptxas info : Compile time = 0.633 ms 2025-09-07T06:31:48.7719509Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:48.7727951Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7745081Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7746058Z #22 451.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:48.7746852Z #22 451.3 ptxas info : Compile time = 0.591 ms 2025-09-07T06:31:48.7751777Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:48.7760318Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7764914Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7766193Z #22 451.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:48.7766969Z #22 451.3 ptxas info : Compile time = 0.636 ms 2025-09-07T06:31:48.7771797Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:48.7781005Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7786092Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7786957Z #22 451.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:48.7787727Z #22 451.3 ptxas info : Compile time = 0.614 ms 2025-09-07T06:31:48.7793312Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:48.7801928Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7806912Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7807899Z #22 451.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:48.7808685Z #22 451.3 ptxas info : Compile time = 0.594 ms 2025-09-07T06:31:48.7813694Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:48.7822834Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7827710Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7828705Z #22 451.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:48.7829421Z #22 451.3 ptxas info : Compile time = 0.626 ms 2025-09-07T06:31:48.7834406Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:48.7843838Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7848656Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7849563Z #22 451.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:48.7850297Z #22 451.3 ptxas info : Compile time = 0.608 ms 2025-09-07T06:31:48.7855037Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:48.7863352Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7868025Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7868897Z #22 451.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:48.7869638Z #22 451.3 ptxas info : Compile time = 0.624 ms 2025-09-07T06:31:48.7874120Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:48.7882732Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7887405Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7888316Z #22 451.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:48.7889067Z #22 451.3 ptxas info : Compile time = 0.614 ms 2025-09-07T06:31:48.7894105Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:48.7903089Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7908035Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7908971Z #22 451.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:48.7909708Z #22 451.3 ptxas info : Compile time = 0.695 ms 2025-09-07T06:31:48.7914360Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:48.7923057Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7927059Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7927835Z #22 451.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:48.7928472Z #22 451.3 ptxas info : Compile time = 0.600 ms 2025-09-07T06:31:48.7932997Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:48.7941375Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7945872Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7946760Z #22 451.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:48.7947486Z #22 451.3 ptxas info : Compile time = 0.625 ms 2025-09-07T06:31:48.7948290Z #22 451.3 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:31:48.7952695Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:48.7960767Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7965533Z #22 451.3 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:31:48.7966737Z #22 451.3 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:31:48.7967870Z #22 451.3 ptxas info : Compile time = 1022.409 ms 2025-09-07T06:31:48.7972212Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:48.7980688Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.7985396Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.7986519Z #22 451.3 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:31:48.7987755Z #22 451.3 ptxas info : Compile time = 979.728 ms 2025-09-07T06:31:48.7992460Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:48.8000205Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.8004600Z #22 451.3 8 bytes stack frame, 8 bytes spill stores, 12 bytes spill loads 2025-09-07T06:31:48.8005848Z #22 451.3 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:31:48.8006924Z #22 451.3 ptxas info : Compile time = 2197.363 ms 2025-09-07T06:31:48.8011333Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:48.8019828Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.8024729Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.8025754Z #22 451.3 ptxas info : Used 246 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:31:48.8026649Z #22 451.3 ptxas info : Compile time = 1737.945 ms 2025-09-07T06:31:48.8031499Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:48.8040190Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.8045128Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.8046282Z #22 451.3 ptxas info : Used 252 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:31:48.8047287Z #22 451.3 ptxas info : Compile time = 1788.110 ms 2025-09-07T06:31:48.8052239Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:48.8061054Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.8065878Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.8066950Z #22 451.3 ptxas info : Used 254 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:31:48.8067892Z #22 451.3 ptxas info : Compile time = 3183.753 ms 2025-09-07T06:31:48.8072818Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:48.8080800Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.8085337Z #22 451.3 128 bytes stack frame, 148 bytes spill stores, 316 bytes spill loads 2025-09-07T06:31:48.8086612Z #22 451.3 ptxas info : Used 255 registers, used 6 barriers, 128 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:31:48.8087886Z #22 451.3 ptxas info : Compile time = 2036.965 ms 2025-09-07T06:31:48.8092236Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:48.8099966Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.8104156Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.8105077Z #22 451.3 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:31:48.8105892Z #22 451.3 ptxas info : Compile time = 1898.911 ms 2025-09-07T06:31:48.8110404Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:48.8117856Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.8122380Z #22 451.3 96 bytes stack frame, 132 bytes spill stores, 244 bytes spill loads 2025-09-07T06:31:48.8123773Z #22 451.3 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:31:48.8124839Z #22 451.3 ptxas info : Compile time = 3705.698 ms 2025-09-07T06:31:48.8129174Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:48.8137331Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.8141497Z #22 451.3 88 bytes stack frame, 104 bytes spill stores, 136 bytes spill loads 2025-09-07T06:31:48.8142684Z #22 451.3 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:31:48.8143700Z #22 451.3 ptxas info : Compile time = 1179.503 ms 2025-09-07T06:31:48.8148333Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:48.8156421Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.8160875Z #22 451.3 144 bytes stack frame, 164 bytes spill stores, 192 bytes spill loads 2025-09-07T06:31:48.8161976Z #22 451.3 ptxas info : Used 255 registers, used 2 barriers, 144 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:31:48.8162951Z #22 451.3 ptxas info : Compile time = 1270.487 ms 2025-09-07T06:31:48.8167225Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:48.8175719Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.8180096Z #22 451.3 168 bytes stack frame, 220 bytes spill stores, 260 bytes spill loads 2025-09-07T06:31:48.8181225Z #22 451.3 ptxas info : Used 255 registers, used 2 barriers, 168 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:31:48.8182332Z #22 451.3 ptxas info : Compile time = 2751.787 ms 2025-09-07T06:31:48.8186601Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:48.8194713Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.8199239Z #22 451.3 128 bytes stack frame, 168 bytes spill stores, 244 bytes spill loads 2025-09-07T06:31:48.8200403Z #22 451.3 ptxas info : Used 255 registers, used 2 barriers, 128 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:31:48.8201441Z #22 451.3 ptxas info : Compile time = 2511.670 ms 2025-09-07T06:31:48.8205775Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:48.8213787Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.8218520Z #22 451.3 120 bytes stack frame, 176 bytes spill stores, 248 bytes spill loads 2025-09-07T06:31:48.8219860Z #22 451.3 ptxas info : Used 255 registers, used 2 barriers, 120 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:31:48.8220904Z #22 451.3 ptxas info : Compile time = 2580.215 ms 2025-09-07T06:31:48.8225123Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:48.8233354Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.8237655Z #22 451.3 248 bytes stack frame, 196 bytes spill stores, 436 bytes spill loads 2025-09-07T06:31:48.8238877Z #22 451.3 ptxas info : Used 255 registers, used 2 barriers, 248 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:31:48.8239905Z #22 451.3 ptxas info : Compile time = 5351.926 ms 2025-09-07T06:31:48.8245865Z #22 451.3 ptxas info : Function properties for _ZZN5flash25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS1_1CILi128EEENS3_ILi48EEES4_EEELi128EN7cutlass10bfloat16_tEfNS7_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EE3mmaINS_16FlashAttnFwdSm80ISB_NS_21CollectiveEpilogueFwdINS2_IJS4_S4_S5_EEENS2_IJNS3_ILi1EEESG_SG_EEES8_SA_Li128ELb1ELb1ELb0ELb0EEENS_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESR_EEESR_NS3_ILi16EEEEEENS2_IJNS2_IJSG_SR_EEENS3_ILi4EEENS3_ILi8EEEEEEEEEENS_7SoftmaxILi4ELi0EEEEEbRKNSB_6ParamsERT0_RT1_iRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUlvE_clEv 2025-09-07T06:31:48.8252239Z #22 451.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:48.8257144Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:48.8265431Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.8269916Z #22 451.3 176 bytes stack frame, 260 bytes spill stores, 656 bytes spill loads 2025-09-07T06:31:48.8271067Z #22 451.3 ptxas info : Used 255 registers, used 6 barriers, 176 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:31:48.8272191Z #22 451.3 ptxas info : Compile time = 2175.613 ms 2025-09-07T06:31:48.8276916Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:48.8284790Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.8289160Z #22 451.3 200 bytes stack frame, 276 bytes spill stores, 364 bytes spill loads 2025-09-07T06:31:48.8290264Z #22 451.3 ptxas info : Used 255 registers, used 2 barriers, 200 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:31:48.8291423Z #22 451.3 ptxas info : Compile time = 1961.740 ms 2025-09-07T06:31:48.8296289Z #22 451.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:48.8304944Z #22 451.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:48.8310030Z #22 451.3 160 bytes stack frame, 240 bytes spill stores, 328 bytes spill loads 2025-09-07T06:31:48.8311430Z #22 451.3 ptxas info : Used 255 registers, used 2 barriers, 160 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:31:48.8312640Z #22 451.3 ptxas info : Compile time = 4253.637 ms 2025-09-07T06:31:53.5830123Z #22 456.3 [40/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_fp16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_fp16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:53.7348178Z #22 456.3 ptxas info : 10 bytes gmem 2025-09-07T06:31:53.7353838Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7362903Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7367459Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7368495Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7369409Z #22 456.3 ptxas info : Compile time = 2.157 ms 2025-09-07T06:31:53.7374728Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7383701Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7388572Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7389668Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7390468Z #22 456.3 ptxas info : Compile time = 1.055 ms 2025-09-07T06:31:53.7395523Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7403230Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7408192Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7409123Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7409911Z #22 456.3 ptxas info : Compile time = 0.721 ms 2025-09-07T06:31:53.7414741Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7423219Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7427942Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7428896Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7429932Z #22 456.3 ptxas info : Compile time = 0.655 ms 2025-09-07T06:31:53.7434677Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7443218Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7447978Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7448940Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7449580Z #22 456.3 ptxas info : Compile time = 0.647 ms 2025-09-07T06:31:53.7453871Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7474225Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7478941Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7480062Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7480865Z #22 456.3 ptxas info : Compile time = 0.625 ms 2025-09-07T06:31:53.7485555Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7494504Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7499248Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7500211Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7501014Z #22 456.3 ptxas info : Compile time = 0.676 ms 2025-09-07T06:31:53.7505056Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7513499Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7518127Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7519091Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7519895Z #22 456.3 ptxas info : Compile time = 0.624 ms 2025-09-07T06:31:53.7524418Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7532847Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7537433Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7538372Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7539191Z #22 456.3 ptxas info : Compile time = 0.641 ms 2025-09-07T06:31:53.7543802Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7552246Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7557398Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7558276Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7559009Z #22 456.3 ptxas info : Compile time = 0.605 ms 2025-09-07T06:31:53.7563520Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7571882Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7576701Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7577851Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7578644Z #22 456.3 ptxas info : Compile time = 0.608 ms 2025-09-07T06:31:53.7582921Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7590654Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:53.7594756Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7595588Z #22 456.3 ptxas info : Used 72 registers, used 1 barriers 2025-09-07T06:31:53.7596297Z #22 456.3 ptxas info : Compile time = 40.076 ms 2025-09-07T06:31:53.7600513Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7608540Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7613481Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7614374Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7615120Z #22 456.3 ptxas info : Compile time = 1.008 ms 2025-09-07T06:31:53.7619508Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7627572Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7631568Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7632431Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7633142Z #22 456.3 ptxas info : Compile time = 0.780 ms 2025-09-07T06:31:53.7637622Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7645933Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7650591Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7651549Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7652492Z #22 456.3 ptxas info : Compile time = 0.658 ms 2025-09-07T06:31:53.7657085Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7665460Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7670128Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7671077Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7671858Z #22 456.3 ptxas info : Compile time = 0.630 ms 2025-09-07T06:31:53.7676383Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7684793Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7689389Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7690284Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7691034Z #22 456.3 ptxas info : Compile time = 0.635 ms 2025-09-07T06:31:53.7695772Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7704004Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7708764Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7709684Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7710428Z #22 456.3 ptxas info : Compile time = 0.644 ms 2025-09-07T06:31:53.7714838Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7722728Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7727154Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7728085Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7728830Z #22 456.3 ptxas info : Compile time = 0.630 ms 2025-09-07T06:31:53.7733375Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7741405Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7746156Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7747069Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7747841Z #22 456.3 ptxas info : Compile time = 0.629 ms 2025-09-07T06:31:53.7752468Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7760473Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7764861Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7765777Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7766515Z #22 456.3 ptxas info : Compile time = 0.684 ms 2025-09-07T06:31:53.7770969Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7778900Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7783099Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7783960Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7784707Z #22 456.3 ptxas info : Compile time = 0.615 ms 2025-09-07T06:31:53.7788919Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7797141Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7801610Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7802560Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7803369Z #22 456.3 ptxas info : Compile time = 0.745 ms 2025-09-07T06:31:53.7805767Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7809381Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:53.7811523Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7812688Z #22 456.3 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:31:53.7813426Z #22 456.3 ptxas info : Compile time = 36.640 ms 2025-09-07T06:31:53.7817876Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7826031Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7830577Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7831535Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7832326Z #22 456.3 ptxas info : Compile time = 0.950 ms 2025-09-07T06:31:53.7836809Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7844362Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:53.7848601Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7849557Z #22 456.3 ptxas info : Used 88 registers, used 1 barriers 2025-09-07T06:31:53.7850374Z #22 456.3 ptxas info : Compile time = 48.453 ms 2025-09-07T06:31:53.7854767Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7861984Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:53.7865836Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7866761Z #22 456.3 ptxas info : Used 55 registers, used 1 barriers 2025-09-07T06:31:53.7867533Z #22 456.3 ptxas info : Compile time = 27.890 ms 2025-09-07T06:31:53.7872032Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7880429Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7884993Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7885955Z #22 456.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:53.7886750Z #22 456.3 ptxas info : Compile time = 1.000 ms 2025-09-07T06:31:53.7889107Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:53.7893348Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:53.7895852Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.7896813Z #22 456.3 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:31:53.7897604Z #22 456.3 ptxas info : Compile time = 39.622 ms 2025-09-07T06:31:53.7898588Z #22 456.3 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:31:53.7903330Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.7911891Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7916008Z #22 456.3 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:31:53.7917212Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.7918298Z #22 456.3 ptxas info : Compile time = 685.713 ms 2025-09-07T06:31:53.7922840Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.7931198Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7936189Z #22 456.3 24 bytes stack frame, 24 bytes spill stores, 32 bytes spill loads 2025-09-07T06:31:53.7937500Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.7938663Z #22 456.3 ptxas info : Compile time = 637.858 ms 2025-09-07T06:31:53.7943438Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.7951955Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7956685Z #22 456.3 40 bytes stack frame, 40 bytes spill stores, 44 bytes spill loads 2025-09-07T06:31:53.7958011Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.7959190Z #22 456.3 ptxas info : Compile time = 710.566 ms 2025-09-07T06:31:53.7964074Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.7971793Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7976537Z #22 456.3 40 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads 2025-09-07T06:31:53.7977839Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.7978976Z #22 456.3 ptxas info : Compile time = 639.918 ms 2025-09-07T06:31:53.7983632Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.7992300Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.7996540Z #22 456.3 32 bytes stack frame, 28 bytes spill stores, 28 bytes spill loads 2025-09-07T06:31:53.7997743Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.7998779Z #22 456.3 ptxas info : Compile time = 742.210 ms 2025-09-07T06:31:53.8003370Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8011956Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8016806Z #22 456.3 32 bytes stack frame, 32 bytes spill stores, 32 bytes spill loads 2025-09-07T06:31:53.8018089Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8019225Z #22 456.3 ptxas info : Compile time = 679.115 ms 2025-09-07T06:31:53.8023940Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8032450Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8037343Z #22 456.3 48 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads 2025-09-07T06:31:53.8038725Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 48 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8039959Z #22 456.3 ptxas info : Compile time = 746.057 ms 2025-09-07T06:31:53.8044742Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8052636Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8056722Z #22 456.3 32 bytes stack frame, 28 bytes spill stores, 28 bytes spill loads 2025-09-07T06:31:53.8057910Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8058999Z #22 456.3 ptxas info : Compile time = 688.504 ms 2025-09-07T06:31:53.8063500Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8071876Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8076280Z #22 456.3 64 bytes stack frame, 68 bytes spill stores, 84 bytes spill loads 2025-09-07T06:31:53.8077577Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8078681Z #22 456.3 ptxas info : Compile time = 716.016 ms 2025-09-07T06:31:53.8082833Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8090045Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8094919Z #22 456.3 40 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads 2025-09-07T06:31:53.8096207Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8097356Z #22 456.3 ptxas info : Compile time = 644.659 ms 2025-09-07T06:31:53.8101770Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8109852Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8114153Z #22 456.3 48 bytes stack frame, 48 bytes spill stores, 56 bytes spill loads 2025-09-07T06:31:53.8115387Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 48 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8116476Z #22 456.3 ptxas info : Compile time = 724.435 ms 2025-09-07T06:31:53.8120540Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8127814Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:53.8132247Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.8133436Z #22 456.3 ptxas info : Used 69 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:53.8134501Z #22 456.3 ptxas info : Compile time = 27.578 ms 2025-09-07T06:31:53.8139049Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8147402Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8151960Z #22 456.3 48 bytes stack frame, 48 bytes spill stores, 52 bytes spill loads 2025-09-07T06:31:53.8153253Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 48 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8154347Z #22 456.3 ptxas info : Compile time = 672.226 ms 2025-09-07T06:31:53.8158954Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8167235Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8171791Z #22 456.3 80 bytes stack frame, 80 bytes spill stores, 108 bytes spill loads 2025-09-07T06:31:53.8173103Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 80 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8174114Z #22 456.3 ptxas info : Compile time = 802.695 ms 2025-09-07T06:31:53.8178024Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8185819Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8190322Z #22 456.3 48 bytes stack frame, 44 bytes spill stores, 60 bytes spill loads 2025-09-07T06:31:53.8191542Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 48 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8193103Z #22 456.3 ptxas info : Compile time = 765.688 ms 2025-09-07T06:31:53.8197693Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8206154Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8210782Z #22 456.3 80 bytes stack frame, 80 bytes spill stores, 84 bytes spill loads 2025-09-07T06:31:53.8212110Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 80 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8213408Z #22 456.3 ptxas info : Compile time = 868.308 ms 2025-09-07T06:31:53.8217926Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8225704Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8230195Z #22 456.3 80 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads 2025-09-07T06:31:53.8231464Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 80 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8232619Z #22 456.3 ptxas info : Compile time = 734.085 ms 2025-09-07T06:31:53.8237157Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8245390Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8250013Z #22 456.3 64 bytes stack frame, 64 bytes spill stores, 88 bytes spill loads 2025-09-07T06:31:53.8251330Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8252644Z #22 456.3 ptxas info : Compile time = 882.190 ms 2025-09-07T06:31:53.8257259Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8265743Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8270313Z #22 456.3 72 bytes stack frame, 68 bytes spill stores, 84 bytes spill loads 2025-09-07T06:31:53.8271417Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8272430Z #22 456.3 ptxas info : Compile time = 796.364 ms 2025-09-07T06:31:53.8276713Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8285006Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8289588Z #22 456.3 88 bytes stack frame, 88 bytes spill stores, 92 bytes spill loads 2025-09-07T06:31:53.8290883Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8292412Z #22 456.3 ptxas info : Compile time = 903.542 ms 2025-09-07T06:31:53.8297037Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8305330Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8309952Z #22 456.3 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads 2025-09-07T06:31:53.8311272Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8312428Z #22 456.3 ptxas info : Compile time = 809.328 ms 2025-09-07T06:31:53.8316918Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8324441Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8328867Z #22 456.3 104 bytes stack frame, 104 bytes spill stores, 136 bytes spill loads 2025-09-07T06:31:53.8330262Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 104 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8331412Z #22 456.3 ptxas info : Compile time = 852.629 ms 2025-09-07T06:31:53.8335934Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8343837Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8348284Z #22 456.3 96 bytes stack frame, 96 bytes spill stores, 120 bytes spill loads 2025-09-07T06:31:53.8349594Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8350746Z #22 456.3 ptxas info : Compile time = 754.296 ms 2025-09-07T06:31:53.8353325Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8357188Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:53.8359609Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.8360671Z #22 456.3 ptxas info : Used 62 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:53.8361587Z #22 456.3 ptxas info : Compile time = 35.376 ms 2025-09-07T06:31:53.8366250Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8374651Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8378659Z #22 456.3 96 bytes stack frame, 92 bytes spill stores, 96 bytes spill loads 2025-09-07T06:31:53.8379910Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8381025Z #22 456.3 ptxas info : Compile time = 875.830 ms 2025-09-07T06:31:53.8385089Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8392814Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:53.8396945Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.8398000Z #22 456.3 ptxas info : Used 90 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:53.8398910Z #22 456.3 ptxas info : Compile time = 38.153 ms 2025-09-07T06:31:53.8403132Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8410534Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_28SM80_16x8x16_F32F16F16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:53.8414833Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.8416131Z #22 456.3 ptxas info : Used 55 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:53.8416973Z #22 456.3 ptxas info : Compile time = 22.907 ms 2025-09-07T06:31:53.8421150Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8429412Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:53.8434000Z #22 456.3 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads 2025-09-07T06:31:53.8435292Z #22 456.3 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:53.8436456Z #22 456.3 ptxas info : Compile time = 773.722 ms 2025-09-07T06:31:53.8438650Z #22 456.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:53.8442314Z #22 456.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_6half_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:53.8444602Z #22 456.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:53.8445631Z #22 456.3 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:53.8446565Z #22 456.3 ptxas info : Compile time = 39.344 ms 2025-09-07T06:31:57.9054828Z #22 460.6 [41/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_bf16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_bf16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_bwd_hdim128_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:31:58.0566851Z #22 460.6 ptxas info : 10 bytes gmem 2025-09-07T06:31:58.0572196Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.0581439Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.0586821Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.0587990Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.0588918Z #22 460.6 ptxas info : Compile time = 2.188 ms 2025-09-07T06:31:58.0594353Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.0603851Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.0609296Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.0610402Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.0611342Z #22 460.6 ptxas info : Compile time = 1.095 ms 2025-09-07T06:31:58.0616786Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.0625861Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.0631160Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.0632326Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.0633279Z #22 460.6 ptxas info : Compile time = 0.742 ms 2025-09-07T06:31:58.0638835Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.0647991Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.0653517Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.0654602Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.0655479Z #22 460.6 ptxas info : Compile time = 0.670 ms 2025-09-07T06:31:58.0660521Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.0669829Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.0675055Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.0676175Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.0677127Z #22 460.6 ptxas info : Compile time = 0.665 ms 2025-09-07T06:31:58.0682268Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.0691764Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.0702679Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.0703654Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.0704486Z #22 460.6 ptxas info : Compile time = 0.634 ms 2025-09-07T06:31:58.0709564Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.0718949Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.0723898Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.0724880Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.0725686Z #22 460.6 ptxas info : Compile time = 0.744 ms 2025-09-07T06:31:58.0730786Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.0740088Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.0745242Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.0746187Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.0747006Z #22 460.6 ptxas info : Compile time = 0.621 ms 2025-09-07T06:31:58.0751841Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.0760716Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.0765930Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.0767064Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.0767899Z #22 460.6 ptxas info : Compile time = 0.649 ms 2025-09-07T06:31:58.0772516Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.0781349Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.0786247Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.0787271Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.0788081Z #22 460.6 ptxas info : Compile time = 0.617 ms 2025-09-07T06:31:58.0793852Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.0802948Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.0807885Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.0808844Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.0809695Z #22 460.6 ptxas info : Compile time = 0.629 ms 2025-09-07T06:31:58.0814096Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.0822359Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:58.0827090Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.0828084Z #22 460.6 ptxas info : Used 72 registers, used 1 barriers 2025-09-07T06:31:58.0828911Z #22 460.6 ptxas info : Compile time = 39.627 ms 2025-09-07T06:31:58.0834270Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.0843260Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.0848191Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.0849214Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.0850035Z #22 460.6 ptxas info : Compile time = 0.939 ms 2025-09-07T06:31:58.0854833Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.0863997Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.0868826Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.0869774Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.0870601Z #22 460.6 ptxas info : Compile time = 0.709 ms 2025-09-07T06:31:58.0875553Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.0884358Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.0889340Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.0890373Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.0891201Z #22 460.6 ptxas info : Compile time = 0.589 ms 2025-09-07T06:31:58.1000143Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.1009275Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1014097Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1015025Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.1015854Z #22 460.6 ptxas info : Compile time = 0.880 ms 2025-09-07T06:31:58.1020819Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.1029876Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1034929Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1036235Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.1037015Z #22 460.6 ptxas info : Compile time = 0.538 ms 2025-09-07T06:31:58.1041771Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.1050501Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1055574Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1056607Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.1057374Z #22 460.6 ptxas info : Compile time = 0.567 ms 2025-09-07T06:31:58.1062343Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.1071240Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1076124Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1077326Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.1078093Z #22 460.6 ptxas info : Compile time = 0.519 ms 2025-09-07T06:31:58.1083035Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.1091802Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1097303Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1098335Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.1099158Z #22 460.6 ptxas info : Compile time = 0.519 ms 2025-09-07T06:31:58.1104522Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.1113683Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1118758Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1119781Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.1120579Z #22 460.6 ptxas info : Compile time = 0.604 ms 2025-09-07T06:31:58.1125404Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.1134010Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1138692Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1139698Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.1140558Z #22 460.6 ptxas info : Compile time = 0.538 ms 2025-09-07T06:31:58.1144970Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.1153644Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1158564Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1159586Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.1160396Z #22 460.6 ptxas info : Compile time = 0.537 ms 2025-09-07T06:31:58.1162939Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.1167101Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:58.1169660Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1170681Z #22 460.6 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:31:58.1171516Z #22 460.6 ptxas info : Compile time = 33.970 ms 2025-09-07T06:31:58.1176855Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.1185796Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1190717Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1191720Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.1192808Z #22 460.6 ptxas info : Compile time = 0.981 ms 2025-09-07T06:31:58.1197197Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.1205323Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:58.1209826Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1210850Z #22 460.6 ptxas info : Used 88 registers, used 1 barriers 2025-09-07T06:31:58.1211697Z #22 460.6 ptxas info : Compile time = 49.946 ms 2025-09-07T06:31:58.1216389Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.1224873Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:58.1229685Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1230701Z #22 460.6 ptxas info : Used 55 registers, used 1 barriers 2025-09-07T06:31:58.1231529Z #22 460.6 ptxas info : Compile time = 29.058 ms 2025-09-07T06:31:58.1236482Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.1245756Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1250703Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1251713Z #22 460.6 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:31:58.1252707Z #22 460.6 ptxas info : Compile time = 1.029 ms 2025-09-07T06:31:58.1255271Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:31:58.1259423Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:58.1262025Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1263065Z #22 460.6 ptxas info : Used 64 registers, used 0 barriers 2025-09-07T06:31:58.1263896Z #22 460.6 ptxas info : Compile time = 41.545 ms 2025-09-07T06:31:58.1264718Z #22 460.6 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:31:58.1269706Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1278712Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1283789Z #22 460.6 8 bytes stack frame, 8 bytes spill stores, 8 bytes spill loads 2025-09-07T06:31:58.1285352Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1286564Z #22 460.6 ptxas info : Compile time = 740.568 ms 2025-09-07T06:31:58.1291519Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1301148Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1306121Z #22 460.6 24 bytes stack frame, 24 bytes spill stores, 32 bytes spill loads 2025-09-07T06:31:58.1307479Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 24 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1308673Z #22 460.6 ptxas info : Compile time = 672.700 ms 2025-09-07T06:31:58.1313989Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1323048Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1328103Z #22 460.6 40 bytes stack frame, 40 bytes spill stores, 44 bytes spill loads 2025-09-07T06:31:58.1329480Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1330701Z #22 460.6 ptxas info : Compile time = 752.868 ms 2025-09-07T06:31:58.1335835Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1345076Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1350141Z #22 460.6 40 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads 2025-09-07T06:31:58.1351531Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1352754Z #22 460.6 ptxas info : Compile time = 755.647 ms 2025-09-07T06:31:58.1357838Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1367181Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1372279Z #22 460.6 32 bytes stack frame, 28 bytes spill stores, 28 bytes spill loads 2025-09-07T06:31:58.1373850Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1375035Z #22 460.6 ptxas info : Compile time = 753.983 ms 2025-09-07T06:31:58.1380081Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1389356Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1394323Z #22 460.6 32 bytes stack frame, 32 bytes spill stores, 32 bytes spill loads 2025-09-07T06:31:58.1395559Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1396649Z #22 460.6 ptxas info : Compile time = 673.353 ms 2025-09-07T06:31:58.1401059Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1409044Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1413818Z #22 460.6 48 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads 2025-09-07T06:31:58.1415072Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 48 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1416274Z #22 460.6 ptxas info : Compile time = 759.142 ms 2025-09-07T06:31:58.1421011Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1430108Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1435170Z #22 460.6 32 bytes stack frame, 28 bytes spill stores, 28 bytes spill loads 2025-09-07T06:31:58.1436570Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 32 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1437755Z #22 460.6 ptxas info : Compile time = 691.961 ms 2025-09-07T06:31:58.1442537Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1451265Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1456543Z #22 460.6 64 bytes stack frame, 68 bytes spill stores, 84 bytes spill loads 2025-09-07T06:31:58.1457948Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1459177Z #22 460.6 ptxas info : Compile time = 716.521 ms 2025-09-07T06:31:58.1463890Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1472599Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1477374Z #22 460.6 40 bytes stack frame, 40 bytes spill stores, 40 bytes spill loads 2025-09-07T06:31:58.1478718Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1479894Z #22 460.6 ptxas info : Compile time = 644.080 ms 2025-09-07T06:31:58.1484827Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1494351Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_21CollectiveEpilogueBwdISB_SC_SE_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1499571Z #22 460.6 48 bytes stack frame, 48 bytes spill stores, 56 bytes spill loads 2025-09-07T06:31:58.1500943Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 48 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1502338Z #22 460.6 ptxas info : Compile time = 730.645 ms 2025-09-07T06:31:58.1506846Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1515014Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi96EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:58.1519541Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1520633Z #22 460.6 ptxas info : Used 69 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:58.1521579Z #22 460.6 ptxas info : Compile time = 29.875 ms 2025-09-07T06:31:58.1526908Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1535891Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi1EN4cute5tupleIJNS5_1CILi64EEENS7_ILi96EEENS7_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb1EEENS1_24CollectiveEpilogueBwdGQAISB_fSE_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi96EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1540858Z #22 460.6 48 bytes stack frame, 48 bytes spill stores, 52 bytes spill loads 2025-09-07T06:31:58.1542227Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 48 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1543424Z #22 460.6 ptxas info : Compile time = 673.989 ms 2025-09-07T06:31:58.1548256Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1556964Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1561867Z #22 460.6 80 bytes stack frame, 80 bytes spill stores, 108 bytes spill loads 2025-09-07T06:31:58.1563157Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 80 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1564508Z #22 460.6 ptxas info : Compile time = 817.546 ms 2025-09-07T06:31:58.1569337Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1578694Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1583862Z #22 460.6 48 bytes stack frame, 44 bytes spill stores, 60 bytes spill loads 2025-09-07T06:31:58.1585352Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 48 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1586712Z #22 460.6 ptxas info : Compile time = 764.248 ms 2025-09-07T06:31:58.1591760Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1600727Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1620934Z #22 460.6 80 bytes stack frame, 80 bytes spill stores, 84 bytes spill loads 2025-09-07T06:31:58.1622341Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 80 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1623590Z #22 460.6 ptxas info : Compile time = 870.886 ms 2025-09-07T06:31:58.1628510Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1637546Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1642476Z #22 460.6 80 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads 2025-09-07T06:31:58.1643901Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 80 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1645158Z #22 460.6 ptxas info : Compile time = 742.732 ms 2025-09-07T06:31:58.1650105Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1659547Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1664758Z #22 460.6 64 bytes stack frame, 64 bytes spill stores, 88 bytes spill loads 2025-09-07T06:31:58.1666141Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 64 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1667376Z #22 460.6 ptxas info : Compile time = 886.899 ms 2025-09-07T06:31:58.1672271Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1681419Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1686307Z #22 460.6 72 bytes stack frame, 68 bytes spill stores, 84 bytes spill loads 2025-09-07T06:31:58.1687671Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 72 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1688843Z #22 460.6 ptxas info : Compile time = 799.357 ms 2025-09-07T06:31:58.1697713Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1706712Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1711606Z #22 460.6 88 bytes stack frame, 88 bytes spill stores, 92 bytes spill loads 2025-09-07T06:31:58.1712953Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1714177Z #22 460.6 ptxas info : Compile time = 922.560 ms 2025-09-07T06:31:58.1719098Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1728056Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1733580Z #22 460.6 88 bytes stack frame, 88 bytes spill stores, 88 bytes spill loads 2025-09-07T06:31:58.1734973Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1736206Z #22 460.6 ptxas info : Compile time = 813.776 ms 2025-09-07T06:31:58.1741006Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1749753Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb0ELb0ELi4EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1754343Z #22 460.6 104 bytes stack frame, 104 bytes spill stores, 136 bytes spill loads 2025-09-07T06:31:58.1756071Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 104 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1757299Z #22 460.6 ptxas info : Compile time = 861.438 ms 2025-09-07T06:31:58.1762018Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1770494Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb0ELb0EEENS1_25SingleTileBwdLPTSchedulerEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1775493Z #22 460.6 96 bytes stack frame, 96 bytes spill stores, 120 bytes spill loads 2025-09-07T06:31:58.1776897Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1778111Z #22 460.6 ptxas info : Compile time = 755.479 ms 2025-09-07T06:31:58.1780634Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1784790Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:58.1787375Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1788506Z #22 460.6 ptxas info : Used 62 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:58.1789469Z #22 460.6 ptxas info : Compile time = 34.767 ms 2025-09-07T06:31:58.1794323Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1803079Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_21CollectiveEpilogueBwdISA_SB_SD_Li256ELb1ELb0ELi4EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1808028Z #22 460.6 96 bytes stack frame, 92 bytes spill stores, 96 bytes spill loads 2025-09-07T06:31:58.1809394Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1810586Z #22 460.6 ptxas info : Compile time = 874.708 ms 2025-09-07T06:31:58.1815279Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1823704Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi128EEES6_EEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSI_SG_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEENS5_ILi64EEENS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:58.1828243Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1829347Z #22 460.6 ptxas info : Used 90 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:58.1830304Z #22 460.6 ptxas info : Compile time = 38.404 ms 2025-09-07T06:31:58.1834797Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1842965Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash32FlashAttnBwdPostprocessConvertdQIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELi256ENS3_8TiledMMAINS3_8MMA_AtomIJNS3_30SM80_16x8x16_F32BF16BF16F32_TNEEEENS3_6LayoutINS4_IJNS5_ILi2EEENS5_ILi4EEENS5_ILi1EEEEEENS4_IJSJ_SH_NS5_ILi0EEEEEEEENS4_IJNS5_ILi32EEES6_NS5_ILi16EEEEEEEELb0EEEEEvNT_6ParamsE 2025-09-07T06:31:58.1847529Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1848636Z #22 460.6 ptxas info : Used 55 registers, used 1 barriers, 464 bytes cmem[0] 2025-09-07T06:31:58.1849549Z #22 460.6 ptxas info : Compile time = 22.972 ms 2025-09-07T06:31:58.1854533Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1863479Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnBwdSm80INS1_25CollectiveMainloopBwdSm80ILi2ELi2EN4cute5tupleIJNS5_1CILi64EEENS7_ILi128EEES9_EEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb0ELi2ELi2ELi2ELi2ELb0EEENS1_24CollectiveEpilogueBwdGQAISA_fSD_Li256ELb1ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb0ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:31:58.1868560Z #22 460.6 96 bytes stack frame, 96 bytes spill stores, 96 bytes spill loads 2025-09-07T06:31:58.1870064Z #22 460.6 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 976 bytes cmem[0] 2025-09-07T06:31:58.1871294Z #22 460.6 ptxas info : Compile time = 770.392 ms 2025-09-07T06:31:58.1873814Z #22 460.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:31:58.1877630Z #22 460.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash22FlashAttnBwdPreprocessIN4cute5tupleIJNS3_1CILi64EEENS5_ILi128EEEEEENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb1EEEEEvNT_6ParamsE 2025-09-07T06:31:58.1880105Z #22 460.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:31:58.1881146Z #22 460.6 ptxas info : Used 64 registers, used 0 barriers, 592 bytes cmem[0] 2025-09-07T06:31:58.1882073Z #22 460.6 ptxas info : Compile time = 38.502 ms 2025-09-07T06:32:00.3975765Z #22 463.1 [42/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_bf16_paged_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:32:00.3993504Z #22 463.1 ptxas info : 10 bytes gmem 2025-09-07T06:32:00.3997853Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:00.4005784Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4012574Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4013522Z #22 463.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:00.4014282Z #22 463.1 ptxas info : Compile time = 2.174 ms 2025-09-07T06:32:00.4018747Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:00.4026721Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4031204Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4032158Z #22 463.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:00.4032940Z #22 463.1 ptxas info : Compile time = 21.243 ms 2025-09-07T06:32:00.4037791Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:00.4045735Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4049955Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4050819Z #22 463.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:00.4051527Z #22 463.1 ptxas info : Compile time = 0.898 ms 2025-09-07T06:32:00.4056253Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:00.4064456Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4069015Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4070195Z #22 463.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:00.4070933Z #22 463.1 ptxas info : Compile time = 0.829 ms 2025-09-07T06:32:00.4075543Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:00.4083832Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4088375Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4089257Z #22 463.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:00.4090027Z #22 463.1 ptxas info : Compile time = 0.960 ms 2025-09-07T06:32:00.4099233Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:00.4107850Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4112414Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4113329Z #22 463.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:00.4114100Z #22 463.1 ptxas info : Compile time = 0.692 ms 2025-09-07T06:32:00.4118660Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:00.4126890Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4131542Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4132674Z #22 463.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:00.4133422Z #22 463.1 ptxas info : Compile time = 0.638 ms 2025-09-07T06:32:00.4137815Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:00.4146281Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4150516Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4151353Z #22 463.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:00.4152080Z #22 463.1 ptxas info : Compile time = 0.635 ms 2025-09-07T06:32:00.4156524Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:00.4164715Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4169216Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4170196Z #22 463.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:00.4170834Z #22 463.1 ptxas info : Compile time = 0.633 ms 2025-09-07T06:32:00.4175663Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:00.4183923Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4188065Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4188894Z #22 463.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:00.4189659Z #22 463.1 ptxas info : Compile time = 0.623 ms 2025-09-07T06:32:00.4194520Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:00.4203087Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4207977Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4208885Z #22 463.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:00.4209644Z #22 463.1 ptxas info : Compile time = 0.605 ms 2025-09-07T06:32:00.4214358Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:00.4223211Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4228433Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4229361Z #22 463.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:00.4230103Z #22 463.1 ptxas info : Compile time = 0.658 ms 2025-09-07T06:32:00.4234890Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:00.4242951Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4247448Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4248348Z #22 463.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:00.4249081Z #22 463.1 ptxas info : Compile time = 0.604 ms 2025-09-07T06:32:00.4254005Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:00.4262156Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4267111Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4268028Z #22 463.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:00.4268885Z #22 463.1 ptxas info : Compile time = 0.619 ms 2025-09-07T06:32:00.4273526Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:00.4281865Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4286501Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4287407Z #22 463.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:00.4288181Z #22 463.1 ptxas info : Compile time = 0.657 ms 2025-09-07T06:32:00.4293881Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:00.4302706Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4307632Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4308534Z #22 463.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:00.4309244Z #22 463.1 ptxas info : Compile time = 0.682 ms 2025-09-07T06:32:00.4313976Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:00.4322539Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4327326Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4328485Z #22 463.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:00.4329248Z #22 463.1 ptxas info : Compile time = 0.616 ms 2025-09-07T06:32:00.4334105Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:00.4342839Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4347417Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4348344Z #22 463.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:00.4349113Z #22 463.1 ptxas info : Compile time = 0.642 ms 2025-09-07T06:32:00.4349837Z #22 463.1 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:32:00.4354580Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.4362363Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4366782Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4367812Z #22 463.1 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:00.4368719Z #22 463.1 ptxas info : Compile time = 1035.930 ms 2025-09-07T06:32:00.4373275Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.4381349Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4385867Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4386941Z #22 463.1 ptxas info : Used 252 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:00.4387802Z #22 463.1 ptxas info : Compile time = 1171.309 ms 2025-09-07T06:32:00.4392204Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.4400631Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4405148Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4406186Z #22 463.1 ptxas info : Used 253 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:00.4407110Z #22 463.1 ptxas info : Compile time = 1766.121 ms 2025-09-07T06:32:00.4411779Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.4420595Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4425116Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4426151Z #22 463.1 ptxas info : Used 254 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:00.4427040Z #22 463.1 ptxas info : Compile time = 1216.703 ms 2025-09-07T06:32:00.4431488Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.4439592Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4444259Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4445257Z #22 463.1 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:00.4446163Z #22 463.1 ptxas info : Compile time = 1517.068 ms 2025-09-07T06:32:00.4450813Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.4459651Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4464465Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4465506Z #22 463.1 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:00.4466392Z #22 463.1 ptxas info : Compile time = 3451.251 ms 2025-09-07T06:32:00.4471068Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.4479862Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4484569Z #22 463.1 120 bytes stack frame, 148 bytes spill stores, 256 bytes spill loads 2025-09-07T06:32:00.4485868Z #22 463.1 ptxas info : Used 255 registers, used 6 barriers, 120 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:32:00.4487017Z #22 463.1 ptxas info : Compile time = 2277.687 ms 2025-09-07T06:32:00.4491351Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.4499886Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4504373Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.4505407Z #22 463.1 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:00.4506339Z #22 463.1 ptxas info : Compile time = 2184.914 ms 2025-09-07T06:32:00.4510793Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.4518562Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4523475Z #22 463.1 8 bytes stack frame, 4 bytes spill stores, 8 bytes spill loads 2025-09-07T06:32:00.4524706Z #22 463.1 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:00.4525861Z #22 463.1 ptxas info : Compile time = 3867.228 ms 2025-09-07T06:32:00.4530518Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.4539129Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.4543608Z #22 463.1 120 bytes stack frame, 148 bytes spill stores, 160 bytes spill loads 2025-09-07T06:32:00.4544991Z #22 463.1 ptxas info : Used 255 registers, used 2 barriers, 120 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:00.4546110Z #22 463.1 ptxas info : Compile time = 1212.411 ms 2025-09-07T06:32:00.4550469Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.5471087Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.5475488Z #22 463.1 40 bytes stack frame, 52 bytes spill stores, 68 bytes spill loads 2025-09-07T06:32:00.5476634Z #22 463.1 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:00.5477671Z #22 463.1 ptxas info : Compile time = 1349.677 ms 2025-09-07T06:32:00.5481800Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.5489526Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.5494578Z #22 463.1 88 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads 2025-09-07T06:32:00.5495904Z #22 463.1 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:00.5497118Z #22 463.1 ptxas info : Compile time = 2644.000 ms 2025-09-07T06:32:00.5501622Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.5509106Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.5513295Z #22 463.1 112 bytes stack frame, 156 bytes spill stores, 196 bytes spill loads 2025-09-07T06:32:00.5514564Z #22 463.1 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:00.5516089Z #22 463.1 ptxas info : Compile time = 2454.624 ms 2025-09-07T06:32:00.5520516Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.5528134Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.5532532Z #22 463.1 112 bytes stack frame, 156 bytes spill stores, 196 bytes spill loads 2025-09-07T06:32:00.5533695Z #22 463.1 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:00.5534677Z #22 463.1 ptxas info : Compile time = 2485.131 ms 2025-09-07T06:32:00.5538917Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.5546790Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.5551367Z #22 463.1 224 bytes stack frame, 164 bytes spill stores, 628 bytes spill loads 2025-09-07T06:32:00.5552542Z #22 463.1 ptxas info : Used 255 registers, used 2 barriers, 224 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:00.5553642Z #22 463.1 ptxas info : Compile time = 6107.822 ms 2025-09-07T06:32:00.5559655Z #22 463.1 ptxas info : Function properties for _ZZN5flash25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS1_1CILi128EEENS3_ILi48EEES4_EEELi128EN7cutlass10bfloat16_tEfNS7_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EE3mmaINS_16FlashAttnFwdSm80ISB_NS_21CollectiveEpilogueFwdINS2_IJS4_S4_S5_EEENS2_IJNS3_ILi1EEESG_SG_EEES8_SA_Li128ELb1ELb1ELb0ELb0EEENS_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESR_EEESR_NS3_ILi16EEEEEENS2_IJNS2_IJSG_SR_EEENS3_ILi4EEENS3_ILi8EEEEEEEEEENS_7SoftmaxILi4ELi0EEEEEbRKNSB_6ParamsERT0_RT1_iRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUlvE_clEv 2025-09-07T06:32:00.5566056Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:00.5570826Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.5579436Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.5584074Z #22 463.1 136 bytes stack frame, 172 bytes spill stores, 312 bytes spill loads 2025-09-07T06:32:00.5585236Z #22 463.1 ptxas info : Used 255 registers, used 6 barriers, 136 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:32:00.5586248Z #22 463.1 ptxas info : Compile time = 2956.446 ms 2025-09-07T06:32:00.5590442Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.5598259Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.5602528Z #22 463.1 112 bytes stack frame, 176 bytes spill stores, 212 bytes spill loads 2025-09-07T06:32:00.5603738Z #22 463.1 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:00.5604751Z #22 463.1 ptxas info : Compile time = 2465.160 ms 2025-09-07T06:32:00.5609076Z #22 463.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:00.5617530Z #22 463.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:00.5621901Z #22 463.1 216 bytes stack frame, 176 bytes spill stores, 396 bytes spill loads 2025-09-07T06:32:00.5623156Z #22 463.1 ptxas info : Used 255 registers, used 2 barriers, 216 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:00.5624238Z #22 463.1 ptxas info : Compile time = 5821.669 ms 2025-09-07T06:32:00.5630674Z #22 463.1 ptxas info : Function properties for _ZZN5flash25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS1_1CILi128EEENS3_ILi64EEES4_EEELi128EN7cutlass10bfloat16_tEfNS7_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EE3mmaINS_16FlashAttnFwdSm80ISB_NS_21CollectiveEpilogueFwdINS2_IJS4_S4_S5_EEENS2_IJNS3_ILi1EEESG_SG_EEES8_SA_Li128ELb1ELb1ELb0ELb0EEENS_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESR_EEESR_NS3_ILi16EEEEEENS2_IJNS2_IJSG_SR_EEENS3_ILi4EEENS3_ILi8EEEEEEEEEENS_7SoftmaxILi4ELi0EEEEEbRKNSB_6ParamsERT0_RT1_iRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUlvE_clEv 2025-09-07T06:32:00.5637112Z #22 463.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.7204062Z #22 464.4 [43/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:32:01.8748738Z #22 464.4 ptxas info : 10 bytes gmem 2025-09-07T06:32:01.8752977Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.8761308Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.8765558Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.8766444Z #22 464.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:01.8767237Z #22 464.4 ptxas info : Compile time = 2.053 ms 2025-09-07T06:32:01.8771565Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.8779986Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.8784344Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.8785224Z #22 464.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:01.8785967Z #22 464.4 ptxas info : Compile time = 21.181 ms 2025-09-07T06:32:01.8790293Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.8798327Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.8802690Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.8803598Z #22 464.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:01.8804330Z #22 464.4 ptxas info : Compile time = 0.955 ms 2025-09-07T06:32:01.8808929Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.8817315Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.8822245Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.8823134Z #22 464.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:01.8823861Z #22 464.4 ptxas info : Compile time = 0.919 ms 2025-09-07T06:32:01.8828432Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.8836740Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.8841399Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.8842600Z #22 464.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:01.8843352Z #22 464.4 ptxas info : Compile time = 0.972 ms 2025-09-07T06:32:01.8847847Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.8856075Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.8860652Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.8861560Z #22 464.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:01.8862308Z #22 464.4 ptxas info : Compile time = 0.657 ms 2025-09-07T06:32:01.8866913Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.8875326Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.8879454Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.8880176Z #22 464.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:01.8880791Z #22 464.4 ptxas info : Compile time = 0.632 ms 2025-09-07T06:32:01.8884814Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.8893331Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.8897830Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.8898767Z #22 464.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:01.8899517Z #22 464.4 ptxas info : Compile time = 0.638 ms 2025-09-07T06:32:01.8904276Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.8912200Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.8916589Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.8917463Z #22 464.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:01.8918234Z #22 464.4 ptxas info : Compile time = 0.641 ms 2025-09-07T06:32:01.8922813Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.8931259Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.8936119Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.8937067Z #22 464.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:01.8937812Z #22 464.4 ptxas info : Compile time = 0.599 ms 2025-09-07T06:32:01.8942464Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.8951264Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.8955953Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.8956880Z #22 464.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:01.8957645Z #22 464.4 ptxas info : Compile time = 0.633 ms 2025-09-07T06:32:01.8962301Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.8970835Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.8975013Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.8975930Z #22 464.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:01.8976697Z #22 464.4 ptxas info : Compile time = 0.555 ms 2025-09-07T06:32:01.8981345Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.8989764Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9095507Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.9096443Z #22 464.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:01.9097187Z #22 464.4 ptxas info : Compile time = 0.560 ms 2025-09-07T06:32:01.9101793Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.9110538Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9115490Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.9116391Z #22 464.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:01.9117203Z #22 464.4 ptxas info : Compile time = 0.579 ms 2025-09-07T06:32:01.9121852Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.9130706Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9135557Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.9136482Z #22 464.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:01.9137227Z #22 464.4 ptxas info : Compile time = 0.597 ms 2025-09-07T06:32:01.9142064Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.9150459Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9155225Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.9156138Z #22 464.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:01.9156910Z #22 464.4 ptxas info : Compile time = 0.642 ms 2025-09-07T06:32:01.9161521Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.9169860Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9174850Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.9175948Z #22 464.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:01.9176676Z #22 464.4 ptxas info : Compile time = 0.597 ms 2025-09-07T06:32:01.9181190Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:01.9189356Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9194181Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.9195104Z #22 464.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:01.9195912Z #22 464.4 ptxas info : Compile time = 0.604 ms 2025-09-07T06:32:01.9197026Z #22 464.4 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:32:01.9201292Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.9209177Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9213591Z #22 464.4 16 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:32:01.9214927Z #22 464.4 ptxas info : Used 255 registers, used 2 barriers, 16 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:01.9216086Z #22 464.4 ptxas info : Compile time = 1040.563 ms 2025-09-07T06:32:01.9220411Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.9228194Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9232787Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.9233848Z #22 464.4 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:01.9234747Z #22 464.4 ptxas info : Compile time = 910.057 ms 2025-09-07T06:32:01.9239120Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.9247120Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9251079Z #22 464.4 8 bytes stack frame, 8 bytes spill stores, 12 bytes spill loads 2025-09-07T06:32:01.9252595Z #22 464.4 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:01.9253736Z #22 464.4 ptxas info : Compile time = 2256.556 ms 2025-09-07T06:32:01.9258628Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.9266933Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9271566Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.9272605Z #22 464.4 ptxas info : Used 246 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:01.9273559Z #22 464.4 ptxas info : Compile time = 1698.124 ms 2025-09-07T06:32:01.9278048Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.9286275Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9290647Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.9291703Z #22 464.4 ptxas info : Used 252 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:01.9293084Z #22 464.4 ptxas info : Compile time = 1775.446 ms 2025-09-07T06:32:01.9297942Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.9306334Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9310899Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.9311994Z #22 464.4 ptxas info : Used 254 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:01.9312916Z #22 464.4 ptxas info : Compile time = 3119.107 ms 2025-09-07T06:32:01.9317440Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.9326600Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9331806Z #22 464.4 128 bytes stack frame, 148 bytes spill stores, 316 bytes spill loads 2025-09-07T06:32:01.9333434Z #22 464.4 ptxas info : Used 255 registers, used 6 barriers, 128 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:32:01.9334657Z #22 464.4 ptxas info : Compile time = 2072.164 ms 2025-09-07T06:32:01.9339592Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.9348484Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9353437Z #22 464.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:01.9354567Z #22 464.4 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:01.9355544Z #22 464.4 ptxas info : Compile time = 1817.438 ms 2025-09-07T06:32:01.9360533Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.9369574Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9374737Z #22 464.4 96 bytes stack frame, 132 bytes spill stores, 244 bytes spill loads 2025-09-07T06:32:01.9376146Z #22 464.4 ptxas info : Used 255 registers, used 2 barriers, 96 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:01.9377378Z #22 464.4 ptxas info : Compile time = 3593.452 ms 2025-09-07T06:32:01.9382525Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.9392534Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9396839Z #22 464.4 88 bytes stack frame, 104 bytes spill stores, 136 bytes spill loads 2025-09-07T06:32:01.9398083Z #22 464.4 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:01.9399165Z #22 464.4 ptxas info : Compile time = 1138.562 ms 2025-09-07T06:32:01.9403632Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.9411818Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9416526Z #22 464.4 144 bytes stack frame, 164 bytes spill stores, 192 bytes spill loads 2025-09-07T06:32:01.9417839Z #22 464.4 ptxas info : Used 255 registers, used 2 barriers, 144 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:01.9418969Z #22 464.4 ptxas info : Compile time = 1235.779 ms 2025-09-07T06:32:01.9423553Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.9432189Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9436926Z #22 464.4 168 bytes stack frame, 220 bytes spill stores, 260 bytes spill loads 2025-09-07T06:32:01.9438207Z #22 464.4 ptxas info : Used 255 registers, used 2 barriers, 168 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:01.9439335Z #22 464.4 ptxas info : Compile time = 2748.816 ms 2025-09-07T06:32:01.9443956Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.9452727Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9457396Z #22 464.4 128 bytes stack frame, 168 bytes spill stores, 244 bytes spill loads 2025-09-07T06:32:01.9458674Z #22 464.4 ptxas info : Used 255 registers, used 2 barriers, 128 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:01.9459812Z #22 464.4 ptxas info : Compile time = 2423.874 ms 2025-09-07T06:32:01.9464415Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.9472710Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9477301Z #22 464.4 120 bytes stack frame, 176 bytes spill stores, 248 bytes spill loads 2025-09-07T06:32:01.9478600Z #22 464.4 ptxas info : Used 255 registers, used 2 barriers, 120 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:01.9479737Z #22 464.4 ptxas info : Compile time = 2552.684 ms 2025-09-07T06:32:01.9484320Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.9493106Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9498035Z #22 464.4 168 bytes stack frame, 228 bytes spill stores, 340 bytes spill loads 2025-09-07T06:32:01.9499308Z #22 464.4 ptxas info : Used 255 registers, used 2 barriers, 168 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:01.9500404Z #22 464.4 ptxas info : Compile time = 4559.512 ms 2025-09-07T06:32:01.9505164Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.9514004Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9519186Z #22 464.4 176 bytes stack frame, 260 bytes spill stores, 656 bytes spill loads 2025-09-07T06:32:01.9520504Z #22 464.4 ptxas info : Used 255 registers, used 6 barriers, 176 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:32:01.9521621Z #22 464.4 ptxas info : Compile time = 2236.832 ms 2025-09-07T06:32:01.9526189Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.9534692Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9539324Z #22 464.4 200 bytes stack frame, 276 bytes spill stores, 364 bytes spill loads 2025-09-07T06:32:01.9540629Z #22 464.4 ptxas info : Used 255 registers, used 2 barriers, 200 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:01.9541786Z #22 464.4 ptxas info : Compile time = 2089.003 ms 2025-09-07T06:32:01.9546338Z #22 464.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:01.9554633Z #22 464.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:01.9559457Z #22 464.4 160 bytes stack frame, 240 bytes spill stores, 328 bytes spill loads 2025-09-07T06:32:01.9560844Z #22 464.4 ptxas info : Used 255 registers, used 2 barriers, 160 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:01.9561967Z #22 464.4 ptxas info : Compile time = 4603.492 ms 2025-09-07T06:32:25.6723599Z #22 488.3 [44/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim128_fp16_paged_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:32:25.8331078Z #22 488.3 ptxas info : 10 bytes gmem 2025-09-07T06:32:25.8334560Z #22 488.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:25.8340403Z #22 488.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8343647Z #22 488.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8344315Z #22 488.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:25.8344901Z #22 488.3 ptxas info : Compile time = 32.401 ms 2025-09-07T06:32:25.8348234Z #22 488.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:25.8354326Z #22 488.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8357663Z #22 488.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8358329Z #22 488.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:25.8358892Z #22 488.3 ptxas info : Compile time = 1.041 ms 2025-09-07T06:32:25.8362096Z #22 488.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:25.8368181Z #22 488.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8371422Z #22 488.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8372648Z #22 488.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:25.8373515Z #22 488.3 ptxas info : Compile time = 0.716 ms 2025-09-07T06:32:25.8378640Z #22 488.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:25.8390019Z #22 488.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8396341Z #22 488.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8397413Z #22 488.3 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:25.8398235Z #22 488.3 ptxas info : Compile time = 0.875 ms 2025-09-07T06:32:25.8403591Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:25.8413713Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8419576Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8420646Z #22 488.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:25.8421420Z #22 488.4 ptxas info : Compile time = 1.044 ms 2025-09-07T06:32:25.8426894Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:25.8436167Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8439584Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8440253Z #22 488.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:25.8442793Z #22 488.4 ptxas info : Compile time = 0.635 ms 2025-09-07T06:32:25.8446171Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:25.8452264Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8455845Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8456532Z #22 488.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:25.8457086Z #22 488.4 ptxas info : Compile time = 0.615 ms 2025-09-07T06:32:25.8460353Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:25.8466127Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8469319Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8470164Z #22 488.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:25.8470708Z #22 488.4 ptxas info : Compile time = 0.625 ms 2025-09-07T06:32:25.8473921Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:25.8479821Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8483006Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8483664Z #22 488.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:25.8484265Z #22 488.4 ptxas info : Compile time = 0.628 ms 2025-09-07T06:32:25.8489652Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:25.8501130Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8507061Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8508138Z #22 488.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:25.8508999Z #22 488.4 ptxas info : Compile time = 0.619 ms 2025-09-07T06:32:25.8514334Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:25.8524104Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8529582Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8530615Z #22 488.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:25.8531483Z #22 488.4 ptxas info : Compile time = 0.615 ms 2025-09-07T06:32:25.8537107Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:25.8547272Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8552475Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8553162Z #22 488.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:25.8553744Z #22 488.4 ptxas info : Compile time = 0.631 ms 2025-09-07T06:32:25.8557069Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:25.8563509Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8566862Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8567544Z #22 488.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:25.8568093Z #22 488.4 ptxas info : Compile time = 0.615 ms 2025-09-07T06:32:25.8571488Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:25.8577852Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8581197Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8581859Z #22 488.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:25.8582422Z #22 488.4 ptxas info : Compile time = 0.613 ms 2025-09-07T06:32:25.8585789Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:25.8692839Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8697049Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8697740Z #22 488.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:25.8698290Z #22 488.4 ptxas info : Compile time = 0.636 ms 2025-09-07T06:32:25.8701949Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:25.8708359Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8712195Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8712874Z #22 488.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:25.8713438Z #22 488.4 ptxas info : Compile time = 0.686 ms 2025-09-07T06:32:25.8716817Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:25.8722934Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8726984Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8728045Z #22 488.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:25.8728894Z #22 488.4 ptxas info : Compile time = 0.615 ms 2025-09-07T06:32:25.8734526Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:25.8745781Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8751731Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8752764Z #22 488.4 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:32:25.8753816Z #22 488.4 ptxas info : Compile time = 0.611 ms 2025-09-07T06:32:25.8754633Z #22 488.4 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:32:25.8759737Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:25.8769149Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8774654Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8775775Z #22 488.4 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:25.8776812Z #22 488.4 ptxas info : Compile time = 924.074 ms 2025-09-07T06:32:25.8782265Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:25.8791655Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8795788Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8796581Z #22 488.4 ptxas info : Used 252 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:25.8797232Z #22 488.4 ptxas info : Compile time = 996.326 ms 2025-09-07T06:32:25.8800451Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:25.8806254Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8809429Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8810194Z #22 488.4 ptxas info : Used 253 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:25.8811115Z #22 488.4 ptxas info : Compile time = 1848.656 ms 2025-09-07T06:32:25.8814607Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:25.8820899Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8824255Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8825005Z #22 488.4 ptxas info : Used 254 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:25.8825685Z #22 488.4 ptxas info : Compile time = 1691.071 ms 2025-09-07T06:32:25.8829296Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:25.8835370Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8838734Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8839494Z #22 488.4 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:25.8840145Z #22 488.4 ptxas info : Compile time = 2186.777 ms 2025-09-07T06:32:25.8843800Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:25.8854093Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEENS7_ILi96EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8860243Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8861497Z #22 488.4 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:25.8862606Z #22 488.4 ptxas info : Compile time = 3813.908 ms 2025-09-07T06:32:25.8868384Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:25.8878546Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8884020Z #22 488.4 120 bytes stack frame, 148 bytes spill stores, 256 bytes spill loads 2025-09-07T06:32:25.8885542Z #22 488.4 ptxas info : Used 255 registers, used 6 barriers, 120 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:32:25.8886846Z #22 488.4 ptxas info : Compile time = 2677.047 ms 2025-09-07T06:32:25.8892225Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:25.8900779Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8905791Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.8906901Z #22 488.4 ptxas info : Used 255 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:32:25.8907900Z #22 488.4 ptxas info : Compile time = 2219.985 ms 2025-09-07T06:32:25.8912935Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:25.8921733Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb1EN4cute5tupleIJNS5_1CILi128EEES8_S8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdIS9_NS6_IJNS7_ILi1EEESF_SF_EEESA_SC_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8926864Z #22 488.4 8 bytes stack frame, 4 bytes spill stores, 8 bytes spill loads 2025-09-07T06:32:25.8928308Z #22 488.4 ptxas info : Used 255 registers, used 2 barriers, 8 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:25.8929559Z #22 488.4 ptxas info : Compile time = 4271.887 ms 2025-09-07T06:32:25.8935080Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:25.8944937Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8950248Z #22 488.4 120 bytes stack frame, 148 bytes spill stores, 160 bytes spill loads 2025-09-07T06:32:25.8951721Z #22 488.4 ptxas info : Used 255 registers, used 2 barriers, 120 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:25.8953016Z #22 488.4 ptxas info : Compile time = 1316.968 ms 2025-09-07T06:32:25.8958307Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:25.8967682Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8973143Z #22 488.4 40 bytes stack frame, 52 bytes spill stores, 68 bytes spill loads 2025-09-07T06:32:25.8974568Z #22 488.4 ptxas info : Used 255 registers, used 2 barriers, 40 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:25.8975849Z #22 488.4 ptxas info : Compile time = 1360.618 ms 2025-09-07T06:32:25.8981118Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:25.8990613Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.8996261Z #22 488.4 88 bytes stack frame, 124 bytes spill stores, 124 bytes spill loads 2025-09-07T06:32:25.8997746Z #22 488.4 ptxas info : Used 255 registers, used 2 barriers, 88 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:25.8999052Z #22 488.4 ptxas info : Compile time = 2738.697 ms 2025-09-07T06:32:25.9004187Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:25.9013967Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.9019547Z #22 488.4 112 bytes stack frame, 156 bytes spill stores, 196 bytes spill loads 2025-09-07T06:32:25.9020974Z #22 488.4 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:25.9022225Z #22 488.4 ptxas info : Compile time = 2524.175 ms 2025-09-07T06:32:25.9027509Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:25.9037020Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.9042545Z #22 488.4 112 bytes stack frame, 156 bytes spill stores, 196 bytes spill loads 2025-09-07T06:32:25.9044043Z #22 488.4 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:25.9045367Z #22 488.4 ptxas info : Compile time = 2810.330 ms 2025-09-07T06:32:25.9050670Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:25.9060224Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi48EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.9065533Z #22 488.4 224 bytes stack frame, 164 bytes spill stores, 628 bytes spill loads 2025-09-07T06:32:25.9067001Z #22 488.4 ptxas info : Used 255 registers, used 2 barriers, 224 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:25.9068319Z #22 488.4 ptxas info : Compile time = 6174.002 ms 2025-09-07T06:32:25.9075802Z #22 488.4 ptxas info : Function properties for _ZZN5flash25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS1_1CILi128EEENS3_ILi48EEES4_EEELi128EN7cutlass6half_tEfNS7_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EE3mmaINS_16FlashAttnFwdSm80ISB_NS_21CollectiveEpilogueFwdINS2_IJS4_S4_S5_EEENS2_IJNS3_ILi1EEESG_SG_EEES8_SA_Li128ELb1ELb1ELb0ELb0EEENS_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESR_EEESR_NS3_ILi16EEEEEENS2_IJNS2_IJSG_SR_EEENS3_ILi4EEENS3_ILi8EEEEEEEEEENS_7SoftmaxILi4ELi0EEEEEbRKNSB_6ParamsERT0_RT1_iRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUlvE_clEv 2025-09-07T06:32:25.9083484Z #22 488.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:25.9089098Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:25.9100668Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi128ELi128ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.9107299Z #22 488.4 136 bytes stack frame, 172 bytes spill stores, 312 bytes spill loads 2025-09-07T06:32:25.9108948Z #22 488.4 ptxas info : Used 255 registers, used 6 barriers, 136 bytes cumulative stack size, 1416 bytes cmem[0] 2025-09-07T06:32:25.9110371Z #22 488.4 ptxas info : Compile time = 2512.589 ms 2025-09-07T06:32:25.9116256Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:25.9124077Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.9127467Z #22 488.4 112 bytes stack frame, 176 bytes spill stores, 212 bytes spill loads 2025-09-07T06:32:25.9128430Z #22 488.4 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:25.9129263Z #22 488.4 ptxas info : Compile time = 2249.401 ms 2025-09-07T06:32:25.9132815Z #22 488.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:25.9138836Z #22 488.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi4ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEES8_EEELi128ENS_6half_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_S8_S9_EEENS6_IJNS7_ILi1EEESH_SH_EEESB_SD_Li128ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:25.9142231Z #22 488.4 112 bytes stack frame, 168 bytes spill stores, 236 bytes spill loads 2025-09-07T06:32:25.9143183Z #22 488.4 ptxas info : Used 255 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:32:25.9144031Z #22 488.4 ptxas info : Compile time = 4333.235 ms 2025-09-07T06:32:43.8882365Z #22 506.6 [45/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:32:43.8901710Z #22 506.6 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:32:43.8906798Z #22 506.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:43.8916054Z #22 506.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:43.8921895Z #22 506.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:43.8922920Z #22 506.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:43.8923838Z #22 506.6 ptxas info : Compile time = 1.608 ms 2025-09-07T06:32:43.8929245Z #22 506.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:43.8939480Z #22 506.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:43.8945268Z #22 506.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:43.8946312Z #22 506.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:43.8947180Z #22 506.6 ptxas info : Compile time = 0.808 ms 2025-09-07T06:32:43.8952661Z #22 506.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:43.8962974Z #22 506.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:43.8968390Z #22 506.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:43.8969412Z #22 506.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:43.8970297Z #22 506.6 ptxas info : Compile time = 0.885 ms 2025-09-07T06:32:43.8975456Z #22 506.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:43.8984535Z #22 506.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:43.8989539Z #22 506.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:43.8990580Z #22 506.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:43.8991466Z #22 506.6 ptxas info : Compile time = 0.627 ms 2025-09-07T06:32:43.8997271Z #22 506.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:43.9007277Z #22 506.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:43.9013244Z #22 506.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:43.9014261Z #22 506.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:43.9015156Z #22 506.6 ptxas info : Compile time = 0.646 ms 2025-09-07T06:32:43.9020501Z #22 506.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:43.9030877Z #22 506.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:43.9036283Z #22 506.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:43.9037317Z #22 506.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:43.9038208Z #22 506.6 ptxas info : Compile time = 0.607 ms 2025-09-07T06:32:43.9043196Z #22 506.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:43.9052234Z #22 506.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:43.9057391Z #22 506.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:43.9058406Z #22 506.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:43.9059308Z #22 506.6 ptxas info : Compile time = 0.585 ms 2025-09-07T06:32:43.9064637Z #22 506.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:43.9074435Z #22 506.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:43.9080090Z #22 506.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:43.9081121Z #22 506.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:43.9081987Z #22 506.6 ptxas info : Compile time = 0.580 ms 2025-09-07T06:32:43.9087346Z #22 506.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:43.9198501Z #22 506.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:43.9203926Z #22 506.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:43.9204964Z #22 506.6 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:43.9205864Z #22 506.6 ptxas info : Compile time = 0.603 ms 2025-09-07T06:32:43.9206522Z #22 506.6 ptxas info : 10 bytes gmem 2025-09-07T06:32:43.9211546Z #22 506.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:43.9220875Z #22 506.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:43.9225939Z #22 506.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:43.9226857Z #22 506.6 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:32:43.9227663Z #22 506.6 ptxas info : Compile time = 493.458 ms 2025-09-07T06:32:43.9233066Z #22 506.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:43.9243561Z #22 506.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:43.9249124Z #22 506.6 16 bytes stack frame, 56 bytes spill stores, 48 bytes spill loads 2025-09-07T06:32:43.9250321Z #22 506.6 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:32:43.9251340Z #22 506.6 ptxas info : Compile time = 774.932 ms 2025-09-07T06:32:43.9256982Z #22 506.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:43.9267185Z #22 506.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:43.9272814Z #22 506.6 48 bytes stack frame, 88 bytes spill stores, 120 bytes spill loads 2025-09-07T06:32:43.9274033Z #22 506.6 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:32:43.9275083Z #22 506.6 ptxas info : Compile time = 1841.553 ms 2025-09-07T06:32:43.9279976Z #22 506.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:43.9289065Z #22 506.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:43.9294513Z #22 506.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:43.9295401Z #22 506.6 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:32:43.9296161Z #22 506.6 ptxas info : Compile time = 1131.447 ms 2025-09-07T06:32:43.9301461Z #22 506.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:43.9311622Z #22 506.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:43.9317204Z #22 506.6 16 bytes stack frame, 44 bytes spill stores, 36 bytes spill loads 2025-09-07T06:32:43.9318362Z #22 506.6 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:32:43.9319546Z #22 506.6 ptxas info : Compile time = 1503.066 ms 2025-09-07T06:32:43.9324263Z #22 506.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:43.9333137Z #22 506.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:43.9337527Z #22 506.6 56 bytes stack frame, 92 bytes spill stores, 136 bytes spill loads 2025-09-07T06:32:43.9338486Z #22 506.6 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:32:43.9339308Z #22 506.6 ptxas info : Compile time = 2986.899 ms 2025-09-07T06:32:43.9343366Z #22 506.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:43.9350882Z #22 506.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:43.9355059Z #22 506.6 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:43.9355841Z #22 506.6 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:32:43.9356593Z #22 506.6 ptxas info : Compile time = 776.493 ms 2025-09-07T06:32:43.9361881Z #22 506.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:43.9371965Z #22 506.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:43.9377682Z #22 506.6 16 bytes stack frame, 44 bytes spill stores, 32 bytes spill loads 2025-09-07T06:32:43.9378864Z #22 506.6 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:32:43.9379849Z #22 506.6 ptxas info : Compile time = 1123.466 ms 2025-09-07T06:32:43.9385189Z #22 506.6 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:43.9395825Z #22 506.6 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:43.9401338Z #22 506.6 48 bytes stack frame, 92 bytes spill stores, 124 bytes spill loads 2025-09-07T06:32:43.9402554Z #22 506.6 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:32:43.9403597Z #22 506.6 ptxas info : Compile time = 2440.135 ms 2025-09-07T06:32:57.6034619Z #22 520.3 [46/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:32:57.6054047Z #22 520.3 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:32:57.6059154Z #22 520.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:57.6068125Z #22 520.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:57.6073379Z #22 520.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:57.6074475Z #22 520.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:57.6075371Z #22 520.3 ptxas info : Compile time = 1.819 ms 2025-09-07T06:32:57.6081324Z #22 520.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:57.6090783Z #22 520.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:57.6096475Z #22 520.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:57.6097612Z #22 520.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:57.6098525Z #22 520.3 ptxas info : Compile time = 0.863 ms 2025-09-07T06:32:57.6104157Z #22 520.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:57.6114483Z #22 520.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:57.6120342Z #22 520.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:57.6121402Z #22 520.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:57.6122367Z #22 520.3 ptxas info : Compile time = 21.019 ms 2025-09-07T06:32:57.6127485Z #22 520.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:57.6137408Z #22 520.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:57.6143119Z #22 520.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:57.6144121Z #22 520.3 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:32:57.6144977Z #22 520.3 ptxas info : Compile time = 0.675 ms 2025-09-07T06:32:57.6150956Z #22 520.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:57.6161084Z #22 520.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:57.6184373Z #22 520.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:57.6185643Z #22 520.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:57.6186639Z #22 520.3 ptxas info : Compile time = 0.577 ms 2025-09-07T06:32:57.6192441Z #22 520.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:57.6202289Z #22 520.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:57.6207552Z #22 520.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:57.6208573Z #22 520.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:57.6209416Z #22 520.3 ptxas info : Compile time = 0.546 ms 2025-09-07T06:32:57.6214495Z #22 520.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:57.6224959Z #22 520.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:57.6229792Z #22 520.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:57.6230908Z #22 520.3 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:32:57.6231861Z #22 520.3 ptxas info : Compile time = 0.530 ms 2025-09-07T06:32:57.6237359Z #22 520.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:57.6247306Z #22 520.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:57.6252980Z #22 520.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:57.6254124Z #22 520.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:57.6255086Z #22 520.3 ptxas info : Compile time = 0.533 ms 2025-09-07T06:32:57.6260662Z #22 520.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:32:57.6269509Z #22 520.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:57.6274437Z #22 520.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:57.6275362Z #22 520.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:32:57.6276265Z #22 520.3 ptxas info : Compile time = 0.593 ms 2025-09-07T06:32:57.6276930Z #22 520.3 ptxas info : 10 bytes gmem 2025-09-07T06:32:57.6281775Z #22 520.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:57.6290702Z #22 520.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:57.6296299Z #22 520.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:57.6297319Z #22 520.3 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:32:57.6298167Z #22 520.3 ptxas info : Compile time = 567.294 ms 2025-09-07T06:32:57.6304140Z #22 520.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:57.6314640Z #22 520.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:57.6320541Z #22 520.3 16 bytes stack frame, 52 bytes spill stores, 44 bytes spill loads 2025-09-07T06:32:57.6321826Z #22 520.3 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:32:57.6322895Z #22 520.3 ptxas info : Compile time = 728.641 ms 2025-09-07T06:32:57.6328089Z #22 520.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:57.6337744Z #22 520.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:57.6343273Z #22 520.3 40 bytes stack frame, 68 bytes spill stores, 112 bytes spill loads 2025-09-07T06:32:57.6344426Z #22 520.3 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:32:57.6345422Z #22 520.3 ptxas info : Compile time = 1870.700 ms 2025-09-07T06:32:57.6350665Z #22 520.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:57.6360135Z #22 520.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:57.6365154Z #22 520.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:57.6366116Z #22 520.3 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:32:57.6366932Z #22 520.3 ptxas info : Compile time = 1208.163 ms 2025-09-07T06:32:57.6372623Z #22 520.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:57.6382635Z #22 520.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:57.6388419Z #22 520.3 16 bytes stack frame, 44 bytes spill stores, 36 bytes spill loads 2025-09-07T06:32:57.6389701Z #22 520.3 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:32:57.6390681Z #22 520.3 ptxas info : Compile time = 1370.210 ms 2025-09-07T06:32:57.6396218Z #22 520.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:57.6406556Z #22 520.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:57.6412751Z #22 520.3 40 bytes stack frame, 76 bytes spill stores, 128 bytes spill loads 2025-09-07T06:32:57.6413909Z #22 520.3 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:32:57.6415063Z #22 520.3 ptxas info : Compile time = 2729.695 ms 2025-09-07T06:32:57.6419937Z #22 520.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:57.6429248Z #22 520.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:57.6434491Z #22 520.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:32:57.6435474Z #22 520.3 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:32:57.6436224Z #22 520.3 ptxas info : Compile time = 871.452 ms 2025-09-07T06:32:57.6441410Z #22 520.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:57.6450974Z #22 520.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:57.6456651Z #22 520.3 16 bytes stack frame, 40 bytes spill stores, 28 bytes spill loads 2025-09-07T06:32:57.6457850Z #22 520.3 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:32:57.6458846Z #22 520.3 ptxas info : Compile time = 954.250 ms 2025-09-07T06:32:57.6464045Z #22 520.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:32:57.6474040Z #22 520.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:32:57.6479791Z #22 520.3 40 bytes stack frame, 72 bytes spill stores, 112 bytes spill loads 2025-09-07T06:32:57.6480982Z #22 520.3 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:32:57.6482136Z #22 520.3 ptxas info : Compile time = 2169.682 ms 2025-09-07T06:33:06.7454312Z #22 529.4 [47/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:06.7474219Z #22 529.4 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:33:06.7479773Z #22 529.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:06.7489203Z #22 529.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:06.7496339Z #22 529.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:06.7496970Z #22 529.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:06.7497525Z #22 529.4 ptxas info : Compile time = 1.772 ms 2025-09-07T06:33:06.7501435Z #22 529.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:06.7509588Z #22 529.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:06.7512533Z #22 529.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:06.7513148Z #22 529.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:06.7513669Z #22 529.4 ptxas info : Compile time = 0.919 ms 2025-09-07T06:33:06.7516882Z #22 529.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:06.7523320Z #22 529.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:06.7526669Z #22 529.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:06.7527287Z #22 529.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:06.7527813Z #22 529.4 ptxas info : Compile time = 21.112 ms 2025-09-07T06:33:06.7530598Z #22 529.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:06.7535612Z #22 529.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:06.7538316Z #22 529.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:06.7538905Z #22 529.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:06.7539430Z #22 529.4 ptxas info : Compile time = 0.715 ms 2025-09-07T06:33:06.7542329Z #22 529.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:06.7547731Z #22 529.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:06.7550643Z #22 529.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:06.7551242Z #22 529.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:06.7551769Z #22 529.4 ptxas info : Compile time = 0.632 ms 2025-09-07T06:33:06.7554686Z #22 529.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:06.7560074Z #22 529.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:06.7562952Z #22 529.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:06.7563552Z #22 529.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:06.7564066Z #22 529.4 ptxas info : Compile time = 0.595 ms 2025-09-07T06:33:06.7566814Z #22 529.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:06.7571687Z #22 529.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:06.7574500Z #22 529.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:06.7575105Z #22 529.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:06.7575626Z #22 529.4 ptxas info : Compile time = 0.586 ms 2025-09-07T06:33:06.7578512Z #22 529.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:06.7583836Z #22 529.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:06.7586761Z #22 529.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:06.7587351Z #22 529.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:06.7587879Z #22 529.4 ptxas info : Compile time = 0.584 ms 2025-09-07T06:33:06.7590771Z #22 529.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:06.7596493Z #22 529.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:06.7599386Z #22 529.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:06.7599991Z #22 529.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:06.7600517Z #22 529.4 ptxas info : Compile time = 0.583 ms 2025-09-07T06:33:06.7600897Z #22 529.4 ptxas info : 10 bytes gmem 2025-09-07T06:33:06.7603633Z #22 529.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:06.7608768Z #22 529.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:06.7611811Z #22 529.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:06.7612518Z #22 529.4 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:06.7612988Z #22 529.4 ptxas info : Compile time = 571.384 ms 2025-09-07T06:33:06.7616272Z #22 529.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:06.7622067Z #22 529.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:06.7624979Z #22 529.4 16 bytes stack frame, 56 bytes spill stores, 48 bytes spill loads 2025-09-07T06:33:06.7625653Z #22 529.4 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:33:06.7626230Z #22 529.4 ptxas info : Compile time = 860.110 ms 2025-09-07T06:33:06.7629153Z #22 529.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:06.7634611Z #22 529.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:06.7637596Z #22 529.4 48 bytes stack frame, 88 bytes spill stores, 120 bytes spill loads 2025-09-07T06:33:06.7638551Z #22 529.4 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:33:06.7639646Z #22 529.4 ptxas info : Compile time = 2008.749 ms 2025-09-07T06:33:06.7644885Z #22 529.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:06.7655035Z #22 529.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:06.7660247Z #22 529.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:06.7661294Z #22 529.4 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:06.7662197Z #22 529.4 ptxas info : Compile time = 1161.582 ms 2025-09-07T06:33:06.7667844Z #22 529.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:06.7678979Z #22 529.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:06.7684441Z #22 529.4 16 bytes stack frame, 44 bytes spill stores, 36 bytes spill loads 2025-09-07T06:33:06.7685594Z #22 529.4 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:33:06.7686491Z #22 529.4 ptxas info : Compile time = 1541.772 ms 2025-09-07T06:33:06.7691455Z #22 529.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:06.7702572Z #22 529.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:06.7708240Z #22 529.4 56 bytes stack frame, 92 bytes spill stores, 136 bytes spill loads 2025-09-07T06:33:06.7709516Z #22 529.4 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:33:06.7710657Z #22 529.4 ptxas info : Compile time = 3033.551 ms 2025-09-07T06:33:06.7716400Z #22 529.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:06.7725966Z #22 529.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:06.7731471Z #22 529.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:06.7732684Z #22 529.4 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:06.7733578Z #22 529.4 ptxas info : Compile time = 916.820 ms 2025-09-07T06:33:06.7739453Z #22 529.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:06.7750076Z #22 529.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:06.7755279Z #22 529.4 16 bytes stack frame, 44 bytes spill stores, 32 bytes spill loads 2025-09-07T06:33:06.7756539Z #22 529.4 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:33:06.7757625Z #22 529.4 ptxas info : Compile time = 1098.761 ms 2025-09-07T06:33:06.7762654Z #22 529.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:06.7773388Z #22 529.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:06.7779171Z #22 529.4 48 bytes stack frame, 92 bytes spill stores, 124 bytes spill loads 2025-09-07T06:33:06.7780325Z #22 529.4 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:33:06.7781437Z #22 529.4 ptxas info : Compile time = 2406.571 ms 2025-09-07T06:33:07.6542597Z #22 530.3 [48/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:07.8125380Z #22 530.3 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:33:07.8129842Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:07.8137973Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8142776Z #22 530.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:07.8143745Z #22 530.3 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:33:07.8144786Z #22 530.3 ptxas info : Compile time = 1.822 ms 2025-09-07T06:33:07.8149104Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:07.8157474Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8161874Z #22 530.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:07.8162802Z #22 530.3 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:33:07.8163645Z #22 530.3 ptxas info : Compile time = 0.807 ms 2025-09-07T06:33:07.8169853Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:07.8178217Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8182975Z #22 530.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:07.8183875Z #22 530.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:33:07.8184750Z #22 530.3 ptxas info : Compile time = 0.698 ms 2025-09-07T06:33:07.8189343Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:07.8200670Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8206728Z #22 530.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:07.8207900Z #22 530.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:33:07.8208910Z #22 530.3 ptxas info : Compile time = 0.760 ms 2025-09-07T06:33:07.8215084Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:07.8226208Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8232200Z #22 530.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:07.8233355Z #22 530.3 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:33:07.8234364Z #22 530.3 ptxas info : Compile time = 0.550 ms 2025-09-07T06:33:07.8240510Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:07.8251938Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8258372Z #22 530.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:07.8259529Z #22 530.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:33:07.8260678Z #22 530.3 ptxas info : Compile time = 0.546 ms 2025-09-07T06:33:07.8266833Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:07.8278162Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8284367Z #22 530.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:07.8285553Z #22 530.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:33:07.8286647Z #22 530.3 ptxas info : Compile time = 0.534 ms 2025-09-07T06:33:07.8292873Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:07.8303755Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8309646Z #22 530.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:07.8310812Z #22 530.3 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:33:07.8311815Z #22 530.3 ptxas info : Compile time = 0.531 ms 2025-09-07T06:33:07.8316042Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:07.8324013Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8328452Z #22 530.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:07.8329341Z #22 530.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:33:07.8330221Z #22 530.3 ptxas info : Compile time = 0.529 ms 2025-09-07T06:33:07.8334676Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:07.8342660Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8346974Z #22 530.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:07.8347852Z #22 530.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:33:07.8348804Z #22 530.3 ptxas info : Compile time = 0.582 ms 2025-09-07T06:33:07.8349339Z #22 530.3 ptxas info : 10 bytes gmem 2025-09-07T06:33:07.8353243Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:07.8360531Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8364407Z #22 530.3 8 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:33:07.8365352Z #22 530.3 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:33:07.8366186Z #22 530.3 ptxas info : Compile time = 689.164 ms 2025-09-07T06:33:07.8370148Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:07.8377827Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8381993Z #22 530.3 24 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:33:07.8382892Z #22 530.3 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:33:07.8383784Z #22 530.3 ptxas info : Compile time = 848.099 ms 2025-09-07T06:33:07.8387963Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:07.8395846Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8399957Z #22 530.3 40 bytes stack frame, 76 bytes spill stores, 84 bytes spill loads 2025-09-07T06:33:07.8400855Z #22 530.3 ptxas info : Used 168 registers, used 9 barriers, 40 bytes cumulative stack size 2025-09-07T06:33:07.8401816Z #22 530.3 ptxas info : Compile time = 879.013 ms 2025-09-07T06:33:07.8405967Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:07.8413778Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8417921Z #22 530.3 56 bytes stack frame, 280 bytes spill stores, 304 bytes spill loads 2025-09-07T06:33:07.8418861Z #22 530.3 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:33:07.8419686Z #22 530.3 ptxas info : Compile time = 2131.635 ms 2025-09-07T06:33:07.8423816Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:07.8431621Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8436026Z #22 530.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:07.8436925Z #22 530.3 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:07.8437620Z #22 530.3 ptxas info : Compile time = 1097.336 ms 2025-09-07T06:33:07.8442048Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:07.8449607Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8453922Z #22 530.3 32 bytes stack frame, 144 bytes spill stores, 164 bytes spill loads 2025-09-07T06:33:07.8454830Z #22 530.3 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:33:07.8455723Z #22 530.3 ptxas info : Compile time = 1410.728 ms 2025-09-07T06:33:07.8459851Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:07.8467615Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8471795Z #22 530.3 40 bytes stack frame, 232 bytes spill stores, 292 bytes spill loads 2025-09-07T06:33:07.8472709Z #22 530.3 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:33:07.8473500Z #22 530.3 ptxas info : Compile time = 2614.649 ms 2025-09-07T06:33:07.8477453Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:07.8484631Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8488688Z #22 530.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:07.8489527Z #22 530.3 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:07.8490124Z #22 530.3 ptxas info : Compile time = 977.621 ms 2025-09-07T06:33:07.8495500Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:07.8506738Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8512701Z #22 530.3 32 bytes stack frame, 64 bytes spill stores, 72 bytes spill loads 2025-09-07T06:33:07.8514028Z #22 530.3 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:33:07.8515335Z #22 530.3 ptxas info : Compile time = 1189.990 ms 2025-09-07T06:33:07.8521478Z #22 530.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:07.8532975Z #22 530.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:07.8538947Z #22 530.3 48 bytes stack frame, 252 bytes spill stores, 268 bytes spill loads 2025-09-07T06:33:07.8540313Z #22 530.3 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:33:07.8541460Z #22 530.3 ptxas info : Compile time = 2345.954 ms 2025-09-07T06:33:11.6491726Z #22 534.3 [49/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_bf16_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:11.8053652Z #22 534.3 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:33:11.8059571Z #22 534.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:11.8069793Z #22 534.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:11.8075387Z #22 534.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:11.8076558Z #22 534.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:11.8077561Z #22 534.3 ptxas info : Compile time = 1.800 ms 2025-09-07T06:33:11.8083677Z #22 534.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:11.8095147Z #22 534.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:11.8102937Z #22 534.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:11.8104153Z #22 534.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:11.8105198Z #22 534.3 ptxas info : Compile time = 0.917 ms 2025-09-07T06:33:11.8111223Z #22 534.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:11.8122798Z #22 534.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:11.8128843Z #22 534.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:11.8130013Z #22 534.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:11.8131008Z #22 534.3 ptxas info : Compile time = 0.876 ms 2025-09-07T06:33:11.8136871Z #22 534.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:11.8147470Z #22 534.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:11.8153127Z #22 534.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:11.8154312Z #22 534.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:11.8155293Z #22 534.3 ptxas info : Compile time = 0.609 ms 2025-09-07T06:33:11.8161489Z #22 534.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:11.8175358Z #22 534.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:11.8181513Z #22 534.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:11.8182674Z #22 534.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:11.8183664Z #22 534.3 ptxas info : Compile time = 0.599 ms 2025-09-07T06:33:11.8189737Z #22 534.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:11.8203556Z #22 534.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:11.8209624Z #22 534.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:11.8210806Z #22 534.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:11.8211812Z #22 534.3 ptxas info : Compile time = 0.580 ms 2025-09-07T06:33:11.8217384Z #22 534.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:11.8227694Z #22 534.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:11.8233278Z #22 534.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:11.8234452Z #22 534.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:11.8235452Z #22 534.3 ptxas info : Compile time = 0.583 ms 2025-09-07T06:33:11.8241547Z #22 534.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:11.8252761Z #22 534.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:11.8258790Z #22 534.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:11.8259903Z #22 534.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:11.8260910Z #22 534.3 ptxas info : Compile time = 0.552 ms 2025-09-07T06:33:11.8266799Z #22 534.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:11.8277960Z #22 534.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:11.8283922Z #22 534.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:11.8285071Z #22 534.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:11.8286057Z #22 534.3 ptxas info : Compile time = 0.595 ms 2025-09-07T06:33:11.8286762Z #22 534.3 ptxas info : 10 bytes gmem 2025-09-07T06:33:11.8292686Z #22 534.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:11.8303127Z #22 534.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:11.8308754Z #22 534.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:11.8309793Z #22 534.3 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:11.8310681Z #22 534.3 ptxas info : Compile time = 923.713 ms 2025-09-07T06:33:11.8316835Z #22 534.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:11.8327988Z #22 534.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:11.8334055Z #22 534.3 64 bytes stack frame, 144 bytes spill stores, 180 bytes spill loads 2025-09-07T06:33:11.8335378Z #22 534.3 ptxas info : Used 168 registers, used 9 barriers, 64 bytes cumulative stack size 2025-09-07T06:33:11.8336510Z #22 534.3 ptxas info : Compile time = 1051.377 ms 2025-09-07T06:33:11.8342473Z #22 534.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:11.8353412Z #22 534.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:11.8359484Z #22 534.3 80 bytes stack frame, 296 bytes spill stores, 344 bytes spill loads 2025-09-07T06:33:11.8360813Z #22 534.3 ptxas info : Used 168 registers, used 16 barriers, 80 bytes cumulative stack size 2025-09-07T06:33:11.8361982Z #22 534.3 ptxas info : Compile time = 2151.871 ms 2025-09-07T06:33:11.8367719Z #22 534.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:11.8378410Z #22 534.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:11.8384099Z #22 534.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:11.8385132Z #22 534.3 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:11.8386012Z #22 534.3 ptxas info : Compile time = 1272.964 ms 2025-09-07T06:33:11.8392630Z #22 534.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:11.8403813Z #22 534.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:11.8409836Z #22 534.3 104 bytes stack frame, 200 bytes spill stores, 240 bytes spill loads 2025-09-07T06:33:11.8411184Z #22 534.3 ptxas info : Used 168 registers, used 9 barriers, 104 bytes cumulative stack size 2025-09-07T06:33:11.8412461Z #22 534.3 ptxas info : Compile time = 1931.835 ms 2025-09-07T06:33:11.8418653Z #22 534.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:11.8430250Z #22 534.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:11.8436353Z #22 534.3 88 bytes stack frame, 348 bytes spill stores, 420 bytes spill loads 2025-09-07T06:33:11.8437668Z #22 534.3 ptxas info : Used 168 registers, used 16 barriers, 88 bytes cumulative stack size 2025-09-07T06:33:11.8438812Z #22 534.3 ptxas info : Compile time = 2930.888 ms 2025-09-07T06:33:11.8444474Z #22 534.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:11.8454925Z #22 534.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:11.8460536Z #22 534.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:11.8461574Z #22 534.3 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:11.8462466Z #22 534.3 ptxas info : Compile time = 1025.471 ms 2025-09-07T06:33:11.8468560Z #22 534.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:11.8479552Z #22 534.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:11.8485564Z #22 534.3 64 bytes stack frame, 148 bytes spill stores, 176 bytes spill loads 2025-09-07T06:33:11.8486901Z #22 534.3 ptxas info : Used 168 registers, used 9 barriers, 64 bytes cumulative stack size 2025-09-07T06:33:11.8488039Z #22 534.3 ptxas info : Compile time = 1437.592 ms 2025-09-07T06:33:11.8494523Z #22 534.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:11.8505802Z #22 534.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:11.8511854Z #22 534.3 80 bytes stack frame, 288 bytes spill stores, 328 bytes spill loads 2025-09-07T06:33:11.8513187Z #22 534.3 ptxas info : Used 168 registers, used 16 barriers, 80 bytes cumulative stack size 2025-09-07T06:33:11.8514333Z #22 534.3 ptxas info : Compile time = 2568.203 ms 2025-09-07T06:33:22.2149431Z #22 544.9 [50/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_packgqa_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_packgqa_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_packgqa_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:22.3738505Z #22 544.9 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:33:22.3742781Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:22.3749856Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.3754254Z #22 544.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:22.3755063Z #22 544.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:22.3755778Z #22 544.9 ptxas info : Compile time = 1.794 ms 2025-09-07T06:33:22.3759966Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:22.3768419Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.3773373Z #22 544.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:22.3774609Z #22 544.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:22.3775425Z #22 544.9 ptxas info : Compile time = 0.872 ms 2025-09-07T06:33:22.3780329Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:22.3789622Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.3794951Z #22 544.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:22.3795932Z #22 544.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:22.3796789Z #22 544.9 ptxas info : Compile time = 0.807 ms 2025-09-07T06:33:22.3801731Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:22.3811010Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.3816380Z #22 544.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:22.3817474Z #22 544.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:22.3818280Z #22 544.9 ptxas info : Compile time = 0.581 ms 2025-09-07T06:33:22.3823204Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:22.3832183Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.3837170Z #22 544.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:22.3838168Z #22 544.9 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:33:22.3839168Z #22 544.9 ptxas info : Compile time = 0.526 ms 2025-09-07T06:33:22.3844209Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:22.3854024Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.3859135Z #22 544.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:22.3860148Z #22 544.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:22.3860975Z #22 544.9 ptxas info : Compile time = 0.518 ms 2025-09-07T06:33:22.3866024Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:22.3875344Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.3880595Z #22 544.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:22.3881627Z #22 544.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:22.3882510Z #22 544.9 ptxas info : Compile time = 0.624 ms 2025-09-07T06:33:22.3887357Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:22.3898686Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.3903597Z #22 544.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:22.3904576Z #22 544.9 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:33:22.3905613Z #22 544.9 ptxas info : Compile time = 0.509 ms 2025-09-07T06:33:22.3910677Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:22.3919976Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.3924953Z #22 544.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:22.3925966Z #22 544.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:22.3926821Z #22 544.9 ptxas info : Compile time = 0.520 ms 2025-09-07T06:33:22.3931828Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:22.3941271Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.3946630Z #22 544.9 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:22.3947770Z #22 544.9 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:22.3948606Z #22 544.9 ptxas info : Compile time = 0.509 ms 2025-09-07T06:33:22.3949243Z #22 544.9 ptxas info : 10 bytes gmem 2025-09-07T06:33:22.3953808Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:22.3962015Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.3966577Z #22 544.9 24 bytes stack frame, 52 bytes spill stores, 56 bytes spill loads 2025-09-07T06:33:22.3967680Z #22 544.9 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:33:22.3968767Z #22 544.9 ptxas info : Compile time = 710.099 ms 2025-09-07T06:33:22.3973594Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:22.3982186Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.3986871Z #22 544.9 32 bytes stack frame, 100 bytes spill stores, 104 bytes spill loads 2025-09-07T06:33:22.3987984Z #22 544.9 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:33:22.3988948Z #22 544.9 ptxas info : Compile time = 741.151 ms 2025-09-07T06:33:22.3994146Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:22.4003361Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.4008538Z #22 544.9 56 bytes stack frame, 204 bytes spill stores, 220 bytes spill loads 2025-09-07T06:33:22.4009628Z #22 544.9 ptxas info : Used 168 registers, used 9 barriers, 56 bytes cumulative stack size 2025-09-07T06:33:22.4010659Z #22 544.9 ptxas info : Compile time = 931.013 ms 2025-09-07T06:33:22.4015706Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:22.4026138Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.4031133Z #22 544.9 64 bytes stack frame, 276 bytes spill stores, 316 bytes spill loads 2025-09-07T06:33:22.4032244Z #22 544.9 ptxas info : Used 168 registers, used 16 barriers, 64 bytes cumulative stack size 2025-09-07T06:33:22.4033432Z #22 544.9 ptxas info : Compile time = 2368.489 ms 2025-09-07T06:33:22.4038353Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:22.4047612Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.4052763Z #22 544.9 8 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads 2025-09-07T06:33:22.4053876Z #22 544.9 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:33:22.4054836Z #22 544.9 ptxas info : Compile time = 1505.288 ms 2025-09-07T06:33:22.4059911Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:22.4069238Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.4074477Z #22 544.9 48 bytes stack frame, 100 bytes spill stores, 128 bytes spill loads 2025-09-07T06:33:22.4075639Z #22 544.9 ptxas info : Used 168 registers, used 9 barriers, 48 bytes cumulative stack size 2025-09-07T06:33:22.4076599Z #22 544.9 ptxas info : Compile time = 1524.975 ms 2025-09-07T06:33:22.4081728Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:22.4090978Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.4096478Z #22 544.9 56 bytes stack frame, 300 bytes spill stores, 344 bytes spill loads 2025-09-07T06:33:22.4097745Z #22 544.9 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:33:22.4098700Z #22 544.9 ptxas info : Compile time = 2653.157 ms 2025-09-07T06:33:22.4103539Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:22.4112602Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.4117395Z #22 544.9 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:33:22.4118489Z #22 544.9 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:33:22.4119461Z #22 544.9 ptxas info : Compile time = 1109.307 ms 2025-09-07T06:33:22.4124443Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:22.4133802Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.4139003Z #22 544.9 32 bytes stack frame, 120 bytes spill stores, 148 bytes spill loads 2025-09-07T06:33:22.4140281Z #22 544.9 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:33:22.4141232Z #22 544.9 ptxas info : Compile time = 1229.888 ms 2025-09-07T06:33:22.4146193Z #22 544.9 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:22.4155605Z #22 544.9 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:22.4160594Z #22 544.9 48 bytes stack frame, 232 bytes spill stores, 276 bytes spill loads 2025-09-07T06:33:22.4161822Z #22 544.9 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:33:22.4162774Z #22 544.9 ptxas info : Compile time = 2292.939 ms 2025-09-07T06:33:27.0982003Z #22 549.8 [51/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:27.1000114Z #22 549.8 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:33:27.1004841Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:27.1014004Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.1018764Z #22 549.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:27.1019819Z #22 549.8 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:33:27.1020743Z #22 549.8 ptxas info : Compile time = 2.092 ms 2025-09-07T06:33:27.1025916Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:27.1035626Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.1040647Z #22 549.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:27.1041618Z #22 549.8 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:33:27.1042477Z #22 549.8 ptxas info : Compile time = 21.180 ms 2025-09-07T06:33:27.1047939Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:27.1057955Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.1063448Z #22 549.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:27.1064418Z #22 549.8 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:33:27.1065330Z #22 549.8 ptxas info : Compile time = 0.942 ms 2025-09-07T06:33:27.1070058Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:27.1078516Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.1083039Z #22 549.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:27.1084038Z #22 549.8 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:33:27.1084908Z #22 549.8 ptxas info : Compile time = 0.634 ms 2025-09-07T06:33:27.1090054Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:27.1103667Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.1109108Z #22 549.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:27.1110184Z #22 549.8 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:33:27.1111120Z #22 549.8 ptxas info : Compile time = 0.569 ms 2025-09-07T06:33:27.1116747Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:27.1126376Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.1131857Z #22 549.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:27.1133098Z #22 549.8 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:33:27.1134047Z #22 549.8 ptxas info : Compile time = 0.569 ms 2025-09-07T06:33:27.1139181Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:27.1149434Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.1154771Z #22 549.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:27.1155734Z #22 549.8 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:33:27.1156585Z #22 549.8 ptxas info : Compile time = 0.626 ms 2025-09-07T06:33:27.1161694Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:27.1171414Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.1176671Z #22 549.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:27.1177535Z #22 549.8 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:33:27.1178295Z #22 549.8 ptxas info : Compile time = 0.543 ms 2025-09-07T06:33:27.1182853Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:27.1191264Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.1196974Z #22 549.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:27.1198014Z #22 549.8 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:33:27.1198937Z #22 549.8 ptxas info : Compile time = 0.545 ms 2025-09-07T06:33:27.1204154Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:27.1214875Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.1220363Z #22 549.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:27.1221368Z #22 549.8 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:33:27.1222295Z #22 549.8 ptxas info : Compile time = 0.569 ms 2025-09-07T06:33:27.1222964Z #22 549.8 ptxas info : 10 bytes gmem 2025-09-07T06:33:27.1227806Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:27.1237115Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.1242084Z #22 549.8 8 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:33:27.1243149Z #22 549.8 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:33:27.1244164Z #22 549.8 ptxas info : Compile time = 666.667 ms 2025-09-07T06:33:27.1249693Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:27.1258636Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEESB_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SB_SB_EEESA_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.1263736Z #22 549.8 24 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:33:27.1264917Z #22 549.8 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:33:27.1265972Z #22 549.8 ptxas info : Compile time = 732.233 ms 2025-09-07T06:33:27.1271417Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:27.1281741Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.1287629Z #22 549.8 40 bytes stack frame, 76 bytes spill stores, 84 bytes spill loads 2025-09-07T06:33:27.1288746Z #22 549.8 ptxas info : Used 168 registers, used 9 barriers, 40 bytes cumulative stack size 2025-09-07T06:33:27.1289944Z #22 549.8 ptxas info : Compile time = 812.702 ms 2025-09-07T06:33:27.1296146Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:27.1305980Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.1311409Z #22 549.8 56 bytes stack frame, 280 bytes spill stores, 304 bytes spill loads 2025-09-07T06:33:27.1312686Z #22 549.8 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:33:27.1313805Z #22 549.8 ptxas info : Compile time = 1746.020 ms 2025-09-07T06:33:27.1319779Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:27.1330488Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.1336479Z #22 549.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:27.1337531Z #22 549.8 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:27.1338405Z #22 549.8 ptxas info : Compile time = 1121.703 ms 2025-09-07T06:33:27.1344436Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:27.1354856Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.1360979Z #22 549.8 32 bytes stack frame, 144 bytes spill stores, 164 bytes spill loads 2025-09-07T06:33:27.1362275Z #22 549.8 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:33:27.1363416Z #22 549.8 ptxas info : Compile time = 1260.433 ms 2025-09-07T06:33:27.1369418Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:27.1379574Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.1384410Z #22 549.8 40 bytes stack frame, 232 bytes spill stores, 292 bytes spill loads 2025-09-07T06:33:27.1385507Z #22 549.8 ptxas info : Used 168 registers, used 16 barriers, 40 bytes cumulative stack size 2025-09-07T06:33:27.1386464Z #22 549.8 ptxas info : Compile time = 2492.490 ms 2025-09-07T06:33:27.1391437Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:27.1400589Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.1405551Z #22 549.8 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:27.1406434Z #22 549.8 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:27.1407174Z #22 549.8 ptxas info : Compile time = 994.022 ms 2025-09-07T06:33:27.2613921Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:27.2624321Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.2629468Z #22 549.8 32 bytes stack frame, 64 bytes spill stores, 72 bytes spill loads 2025-09-07T06:33:27.2630578Z #22 549.8 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:33:27.2631549Z #22 549.8 ptxas info : Compile time = 1146.381 ms 2025-09-07T06:33:27.2636919Z #22 549.8 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:27.2647238Z #22 549.8 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:27.2653167Z #22 549.8 48 bytes stack frame, 252 bytes spill stores, 268 bytes spill loads 2025-09-07T06:33:27.2654449Z #22 549.8 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:33:27.2655498Z #22 549.8 ptxas info : Compile time = 2170.164 ms 2025-09-07T06:33:32.0425511Z #22 554.7 [52/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_fp16_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:32.0443970Z #22 554.7 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:33:32.0448579Z #22 554.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.0457098Z #22 554.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.0462069Z #22 554.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.0463175Z #22 554.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:32.0464283Z #22 554.7 ptxas info : Compile time = 1.863 ms 2025-09-07T06:33:32.0468963Z #22 554.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.0477903Z #22 554.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.0483109Z #22 554.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.0484127Z #22 554.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:32.0485044Z #22 554.7 ptxas info : Compile time = 0.945 ms 2025-09-07T06:33:32.0490193Z #22 554.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.0499514Z #22 554.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.0504874Z #22 554.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.0505864Z #22 554.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:32.0506950Z #22 554.7 ptxas info : Compile time = 0.766 ms 2025-09-07T06:33:32.0511765Z #22 554.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.0520447Z #22 554.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.0524964Z #22 554.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.0525955Z #22 554.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:32.0526790Z #22 554.7 ptxas info : Compile time = 20.897 ms 2025-09-07T06:33:32.0532380Z #22 554.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.0542054Z #22 554.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.0547255Z #22 554.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.0548337Z #22 554.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:32.0549128Z #22 554.7 ptxas info : Compile time = 0.784 ms 2025-09-07T06:33:32.0554243Z #22 554.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.0562997Z #22 554.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.0568077Z #22 554.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.0569116Z #22 554.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:32.0570112Z #22 554.7 ptxas info : Compile time = 0.735 ms 2025-09-07T06:33:32.0574704Z #22 554.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.0583096Z #22 554.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.0587742Z #22 554.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.0588841Z #22 554.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:32.0589806Z #22 554.7 ptxas info : Compile time = 0.728 ms 2025-09-07T06:33:32.0595479Z #22 554.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.0606311Z #22 554.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.0611735Z #22 554.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.0612942Z #22 554.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:32.0613814Z #22 554.7 ptxas info : Compile time = 0.637 ms 2025-09-07T06:33:32.0618856Z #22 554.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:32.0627790Z #22 554.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.0633128Z #22 554.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.0634153Z #22 554.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:32.0635050Z #22 554.7 ptxas info : Compile time = 0.515 ms 2025-09-07T06:33:32.0635839Z #22 554.7 ptxas info : 10 bytes gmem 2025-09-07T06:33:32.0640516Z #22 554.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.0648887Z #22 554.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.0653598Z #22 554.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.0654524Z #22 554.7 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:32.0655277Z #22 554.7 ptxas info : Compile time = 677.592 ms 2025-09-07T06:33:32.0660413Z #22 554.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.0669757Z #22 554.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.0674640Z #22 554.7 64 bytes stack frame, 144 bytes spill stores, 180 bytes spill loads 2025-09-07T06:33:32.0675782Z #22 554.7 ptxas info : Used 168 registers, used 9 barriers, 64 bytes cumulative stack size 2025-09-07T06:33:32.0676732Z #22 554.7 ptxas info : Compile time = 1014.027 ms 2025-09-07T06:33:32.0681830Z #22 554.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.0690640Z #22 554.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.0696298Z #22 554.7 80 bytes stack frame, 296 bytes spill stores, 344 bytes spill loads 2025-09-07T06:33:32.0697499Z #22 554.7 ptxas info : Used 168 registers, used 16 barriers, 80 bytes cumulative stack size 2025-09-07T06:33:32.0698501Z #22 554.7 ptxas info : Compile time = 2000.048 ms 2025-09-07T06:33:32.0703589Z #22 554.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.0711745Z #22 554.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.0716637Z #22 554.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.0717581Z #22 554.7 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:32.0718375Z #22 554.7 ptxas info : Compile time = 1120.873 ms 2025-09-07T06:33:32.0723958Z #22 554.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.0733609Z #22 554.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.0739047Z #22 554.7 104 bytes stack frame, 200 bytes spill stores, 240 bytes spill loads 2025-09-07T06:33:32.0740043Z #22 554.7 ptxas info : Used 168 registers, used 9 barriers, 104 bytes cumulative stack size 2025-09-07T06:33:32.0740883Z #22 554.7 ptxas info : Compile time = 1309.685 ms 2025-09-07T06:33:32.0745909Z #22 554.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.0755831Z #22 554.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.0761279Z #22 554.7 88 bytes stack frame, 348 bytes spill stores, 420 bytes spill loads 2025-09-07T06:33:32.0762449Z #22 554.7 ptxas info : Used 168 registers, used 16 barriers, 88 bytes cumulative stack size 2025-09-07T06:33:32.0763410Z #22 554.7 ptxas info : Compile time = 2406.160 ms 2025-09-07T06:33:32.0768400Z #22 554.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.0776962Z #22 554.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.0781796Z #22 554.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:32.0782802Z #22 554.7 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:32.0783588Z #22 554.7 ptxas info : Compile time = 765.412 ms 2025-09-07T06:33:32.0788990Z #22 554.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.0798514Z #22 554.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.0803973Z #22 554.7 64 bytes stack frame, 148 bytes spill stores, 176 bytes spill loads 2025-09-07T06:33:32.0805381Z #22 554.7 ptxas info : Used 168 registers, used 9 barriers, 64 bytes cumulative stack size 2025-09-07T06:33:32.0806273Z #22 554.7 ptxas info : Compile time = 1355.068 ms 2025-09-07T06:33:32.0811087Z #22 554.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:32.0820354Z #22 554.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEESA_NS7_ILi192EEEEEELi128ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SA_SA_EEES9_SD_SF_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:32.0825811Z #22 554.7 80 bytes stack frame, 288 bytes spill stores, 328 bytes spill loads 2025-09-07T06:33:32.0826931Z #22 554.7 ptxas info : Used 168 registers, used 16 barriers, 80 bytes cumulative stack size 2025-09-07T06:33:32.0827932Z #22 554.7 ptxas info : Compile time = 2585.935 ms 2025-09-07T06:33:36.4785882Z #22 559.1 [53/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:36.4804651Z #22 559.1 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:33:36.4809706Z #22 559.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:36.4817969Z #22 559.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:36.4823426Z #22 559.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:36.4824552Z #22 559.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:36.4825399Z #22 559.1 ptxas info : Compile time = 1.772 ms 2025-09-07T06:33:36.4830430Z #22 559.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:36.4840374Z #22 559.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:36.4846048Z #22 559.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:36.4847099Z #22 559.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:36.4847993Z #22 559.1 ptxas info : Compile time = 0.895 ms 2025-09-07T06:33:36.4853645Z #22 559.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:36.4863882Z #22 559.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:36.4869350Z #22 559.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:36.4870430Z #22 559.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:36.4871333Z #22 559.1 ptxas info : Compile time = 0.805 ms 2025-09-07T06:33:36.4876638Z #22 559.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:36.4889456Z #22 559.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:36.4895862Z #22 559.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:36.4896838Z #22 559.1 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:33:36.4897668Z #22 559.1 ptxas info : Compile time = 0.617 ms 2025-09-07T06:33:36.4902956Z #22 559.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:36.4912747Z #22 559.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:36.4918422Z #22 559.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:36.4919554Z #22 559.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:36.4920426Z #22 559.1 ptxas info : Compile time = 0.568 ms 2025-09-07T06:33:36.4925607Z #22 559.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:36.4935566Z #22 559.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:36.4940892Z #22 559.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:36.4941918Z #22 559.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:36.4942835Z #22 559.1 ptxas info : Compile time = 0.589 ms 2025-09-07T06:33:36.4948094Z #22 559.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:36.4958271Z #22 559.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:36.4963950Z #22 559.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:36.4965079Z #22 559.1 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:33:36.4966050Z #22 559.1 ptxas info : Compile time = 0.557 ms 2025-09-07T06:33:36.4971644Z #22 559.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:36.4982619Z #22 559.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:36.4988618Z #22 559.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:36.4989718Z #22 559.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:36.4990688Z #22 559.1 ptxas info : Compile time = 0.553 ms 2025-09-07T06:33:36.4996418Z #22 559.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:36.5007221Z #22 559.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:36.5013015Z #22 559.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:36.5014011Z #22 559.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:36.5014947Z #22 559.1 ptxas info : Compile time = 0.550 ms 2025-09-07T06:33:36.5015537Z #22 559.1 ptxas info : 10 bytes gmem 2025-09-07T06:33:36.5020481Z #22 559.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:36.5029992Z #22 559.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:36.5035339Z #22 559.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:36.5036355Z #22 559.1 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:36.5037224Z #22 559.1 ptxas info : Compile time = 718.781 ms 2025-09-07T06:33:36.5042913Z #22 559.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:36.5054040Z #22 559.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:36.5059869Z #22 559.1 16 bytes stack frame, 40 bytes spill stores, 28 bytes spill loads 2025-09-07T06:33:36.5061120Z #22 559.1 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:33:36.5062208Z #22 559.1 ptxas info : Compile time = 879.787 ms 2025-09-07T06:33:36.5068000Z #22 559.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:36.5078734Z #22 559.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:36.5084647Z #22 559.1 48 bytes stack frame, 100 bytes spill stores, 132 bytes spill loads 2025-09-07T06:33:36.5085918Z #22 559.1 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:33:36.5087039Z #22 559.1 ptxas info : Compile time = 1782.147 ms 2025-09-07T06:33:36.5093265Z #22 559.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:36.5103325Z #22 559.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:36.5108852Z #22 559.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:36.5109857Z #22 559.1 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:36.5110667Z #22 559.1 ptxas info : Compile time = 1550.377 ms 2025-09-07T06:33:36.5130045Z #22 559.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:36.5141204Z #22 559.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:36.5147115Z #22 559.1 16 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads 2025-09-07T06:33:36.5148380Z #22 559.1 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:33:36.5149510Z #22 559.1 ptxas info : Compile time = 1684.032 ms 2025-09-07T06:33:36.5154750Z #22 559.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:36.5164623Z #22 559.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:36.5169911Z #22 559.1 64 bytes stack frame, 128 bytes spill stores, 172 bytes spill loads 2025-09-07T06:33:36.5171107Z #22 559.1 ptxas info : Used 168 registers, used 16 barriers, 64 bytes cumulative stack size 2025-09-07T06:33:36.5172159Z #22 559.1 ptxas info : Compile time = 2772.327 ms 2025-09-07T06:33:36.5178150Z #22 559.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:36.5188560Z #22 559.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:36.5194526Z #22 559.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:36.5195570Z #22 559.1 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:36.5196445Z #22 559.1 ptxas info : Compile time = 1266.900 ms 2025-09-07T06:33:36.5201725Z #22 559.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:36.5213078Z #22 559.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:36.5218639Z #22 559.1 16 bytes stack frame, 40 bytes spill stores, 28 bytes spill loads 2025-09-07T06:33:36.5219849Z #22 559.1 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:33:36.5220951Z #22 559.1 ptxas info : Compile time = 1046.493 ms 2025-09-07T06:33:36.5226804Z #22 559.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:36.5237044Z #22 559.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:36.5242658Z #22 559.1 48 bytes stack frame, 100 bytes spill stores, 128 bytes spill loads 2025-09-07T06:33:36.5243922Z #22 559.1 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:33:36.5244994Z #22 559.1 ptxas info : Compile time = 2304.523 ms 2025-09-07T06:33:47.4591646Z #22 570.1 [54/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:47.4611180Z #22 570.1 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:33:47.4616668Z #22 570.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:47.4626612Z #22 570.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:47.4632199Z #22 570.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:47.4633322Z #22 570.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:47.4634266Z #22 570.1 ptxas info : Compile time = 1.709 ms 2025-09-07T06:33:47.4640072Z #22 570.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:47.4650719Z #22 570.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:47.4660072Z #22 570.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:47.4661202Z #22 570.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:47.4662180Z #22 570.1 ptxas info : Compile time = 0.890 ms 2025-09-07T06:33:47.4667879Z #22 570.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:47.4678493Z #22 570.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:47.4684688Z #22 570.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:47.4685816Z #22 570.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:47.4686774Z #22 570.1 ptxas info : Compile time = 0.765 ms 2025-09-07T06:33:47.4692537Z #22 570.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:47.4702551Z #22 570.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:47.4708129Z #22 570.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:47.4709239Z #22 570.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:47.4710185Z #22 570.1 ptxas info : Compile time = 0.497 ms 2025-09-07T06:33:47.4715782Z #22 570.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:47.4726488Z #22 570.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:47.4732155Z #22 570.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:47.4733434Z #22 570.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:47.4734423Z #22 570.1 ptxas info : Compile time = 0.482 ms 2025-09-07T06:33:47.4740316Z #22 570.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:47.4750588Z #22 570.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:47.4756483Z #22 570.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:47.4757554Z #22 570.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:47.4758489Z #22 570.1 ptxas info : Compile time = 0.498 ms 2025-09-07T06:33:47.4763741Z #22 570.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:47.4773678Z #22 570.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:47.4779208Z #22 570.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:47.4780308Z #22 570.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:47.4781284Z #22 570.1 ptxas info : Compile time = 0.506 ms 2025-09-07T06:33:47.4786980Z #22 570.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:47.4797719Z #22 570.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:47.4803393Z #22 570.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:47.4804447Z #22 570.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:47.4805385Z #22 570.1 ptxas info : Compile time = 0.479 ms 2025-09-07T06:33:47.4810959Z #22 570.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:47.4821532Z #22 570.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:47.4827549Z #22 570.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:47.4828656Z #22 570.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:47.4829621Z #22 570.1 ptxas info : Compile time = 0.511 ms 2025-09-07T06:33:47.4830324Z #22 570.1 ptxas info : 10 bytes gmem 2025-09-07T06:33:47.4835676Z #22 570.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:47.4845461Z #22 570.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:47.4850779Z #22 570.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:47.4851770Z #22 570.1 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:47.4852765Z #22 570.1 ptxas info : Compile time = 635.240 ms 2025-09-07T06:33:47.4858376Z #22 570.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:47.4868996Z #22 570.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:47.4874460Z #22 570.1 32 bytes stack frame, 56 bytes spill stores, 44 bytes spill loads 2025-09-07T06:33:47.4875636Z #22 570.1 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:33:47.4876730Z #22 570.1 ptxas info : Compile time = 642.691 ms 2025-09-07T06:33:47.4882301Z #22 570.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:47.4893239Z #22 570.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:47.4899134Z #22 570.1 128 bytes stack frame, 212 bytes spill stores, 308 bytes spill loads 2025-09-07T06:33:47.4900414Z #22 570.1 ptxas info : Used 168 registers, used 16 barriers, 128 bytes cumulative stack size 2025-09-07T06:33:47.4901519Z #22 570.1 ptxas info : Compile time = 1324.591 ms 2025-09-07T06:33:47.4906762Z #22 570.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:47.4916675Z #22 570.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:47.4921909Z #22 570.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:47.4922895Z #22 570.1 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:47.4923695Z #22 570.1 ptxas info : Compile time = 946.283 ms 2025-09-07T06:33:47.4929349Z #22 570.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:47.4940137Z #22 570.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:47.4945866Z #22 570.1 24 bytes stack frame, 52 bytes spill stores, 44 bytes spill loads 2025-09-07T06:33:47.4947104Z #22 570.1 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:33:47.4948193Z #22 570.1 ptxas info : Compile time = 1188.234 ms 2025-09-07T06:33:47.4953967Z #22 570.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:47.4964644Z #22 570.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:47.4970440Z #22 570.1 104 bytes stack frame, 152 bytes spill stores, 208 bytes spill loads 2025-09-07T06:33:47.4971702Z #22 570.1 ptxas info : Used 168 registers, used 16 barriers, 104 bytes cumulative stack size 2025-09-07T06:33:47.4972922Z #22 570.1 ptxas info : Compile time = 2258.763 ms 2025-09-07T06:33:47.4978182Z #22 570.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:47.4987976Z #22 570.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:47.4993357Z #22 570.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:47.4994394Z #22 570.1 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:33:47.4995207Z #22 570.1 ptxas info : Compile time = 649.930 ms 2025-09-07T06:33:47.5000788Z #22 570.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:47.5011569Z #22 570.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:47.5017533Z #22 570.1 24 bytes stack frame, 52 bytes spill stores, 40 bytes spill loads 2025-09-07T06:33:47.5018760Z #22 570.1 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:33:47.5019875Z #22 570.1 ptxas info : Compile time = 960.023 ms 2025-09-07T06:33:47.5025699Z #22 570.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:47.5036447Z #22 570.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:47.5042298Z #22 570.1 104 bytes stack frame, 156 bytes spill stores, 212 bytes spill loads 2025-09-07T06:33:47.5043544Z #22 570.1 ptxas info : Used 168 registers, used 16 barriers, 104 bytes cumulative stack size 2025-09-07T06:33:47.5044631Z #22 570.1 ptxas info : Compile time = 2761.116 ms 2025-09-07T06:33:53.6574105Z #22 576.3 [55/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_packgqa_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_packgqa_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_packgqa_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:33:53.6593216Z #22 576.3 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:33:53.6598396Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:53.6607847Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.6613539Z #22 576.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:53.6614597Z #22 576.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:53.6615469Z #22 576.3 ptxas info : Compile time = 1.839 ms 2025-09-07T06:33:53.6620524Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:53.6630517Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.6635962Z #22 576.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:53.6637137Z #22 576.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:53.6638088Z #22 576.3 ptxas info : Compile time = 0.949 ms 2025-09-07T06:33:53.6644156Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:53.6654611Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.6660154Z #22 576.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:53.6661470Z #22 576.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:53.6662304Z #22 576.3 ptxas info : Compile time = 0.896 ms 2025-09-07T06:33:53.6667723Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:53.6678029Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.6683928Z #22 576.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:53.6685041Z #22 576.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:53.6685969Z #22 576.3 ptxas info : Compile time = 0.649 ms 2025-09-07T06:33:53.6691357Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:53.6701969Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.6707635Z #22 576.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:53.6708708Z #22 576.3 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:33:53.6709614Z #22 576.3 ptxas info : Compile time = 0.630 ms 2025-09-07T06:33:53.6717571Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:53.6728165Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.6734422Z #22 576.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:53.6735548Z #22 576.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:53.6736470Z #22 576.3 ptxas info : Compile time = 0.607 ms 2025-09-07T06:33:53.6742150Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:53.6752024Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.6757867Z #22 576.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:53.6758965Z #22 576.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:53.6759974Z #22 576.3 ptxas info : Compile time = 0.592 ms 2025-09-07T06:33:53.6765807Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:53.6775559Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.6780725Z #22 576.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:53.6781786Z #22 576.3 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:33:53.6782655Z #22 576.3 ptxas info : Compile time = 0.636 ms 2025-09-07T06:33:53.6788643Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:53.6799721Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.6805357Z #22 576.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:53.6806494Z #22 576.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:53.6807494Z #22 576.3 ptxas info : Compile time = 0.608 ms 2025-09-07T06:33:53.6813369Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:33:53.6823343Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.6829220Z #22 576.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:33:53.6830330Z #22 576.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:33:53.6831133Z #22 576.3 ptxas info : Compile time = 0.574 ms 2025-09-07T06:33:53.6832058Z #22 576.3 ptxas info : 10 bytes gmem 2025-09-07T06:33:53.6837021Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:53.6846890Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.6852016Z #22 576.3 24 bytes stack frame, 52 bytes spill stores, 56 bytes spill loads 2025-09-07T06:33:53.6853342Z #22 576.3 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:33:53.6854337Z #22 576.3 ptxas info : Compile time = 641.462 ms 2025-09-07T06:33:53.6859831Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:53.6869399Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.6875144Z #22 576.3 32 bytes stack frame, 108 bytes spill stores, 112 bytes spill loads 2025-09-07T06:33:53.6876342Z #22 576.3 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:33:53.6877409Z #22 576.3 ptxas info : Compile time = 853.299 ms 2025-09-07T06:33:53.6882701Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:53.6893629Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.6899543Z #22 576.3 56 bytes stack frame, 240 bytes spill stores, 252 bytes spill loads 2025-09-07T06:33:53.6900790Z #22 576.3 ptxas info : Used 168 registers, used 9 barriers, 56 bytes cumulative stack size 2025-09-07T06:33:53.6902040Z #22 576.3 ptxas info : Compile time = 1032.967 ms 2025-09-07T06:33:53.6907666Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:53.6917733Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.6923443Z #22 576.3 64 bytes stack frame, 268 bytes spill stores, 308 bytes spill loads 2025-09-07T06:33:53.6924715Z #22 576.3 ptxas info : Used 168 registers, used 16 barriers, 64 bytes cumulative stack size 2025-09-07T06:33:53.6926090Z #22 576.3 ptxas info : Compile time = 1850.054 ms 2025-09-07T06:33:53.6931315Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:53.6941463Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.6946647Z #22 576.3 32 bytes stack frame, 100 bytes spill stores, 88 bytes spill loads 2025-09-07T06:33:53.6947866Z #22 576.3 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:33:53.6948934Z #22 576.3 ptxas info : Compile time = 1571.132 ms 2025-09-07T06:33:53.6954735Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:53.6964736Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.6970456Z #22 576.3 48 bytes stack frame, 108 bytes spill stores, 140 bytes spill loads 2025-09-07T06:33:53.6971766Z #22 576.3 ptxas info : Used 168 registers, used 9 barriers, 48 bytes cumulative stack size 2025-09-07T06:33:53.6972974Z #22 576.3 ptxas info : Compile time = 1843.909 ms 2025-09-07T06:33:53.6978505Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:53.6988927Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.6994751Z #22 576.3 64 bytes stack frame, 256 bytes spill stores, 312 bytes spill loads 2025-09-07T06:33:53.6996259Z #22 576.3 ptxas info : Used 168 registers, used 16 barriers, 64 bytes cumulative stack size 2025-09-07T06:33:53.6997331Z #22 576.3 ptxas info : Compile time = 3072.885 ms 2025-09-07T06:33:53.7002730Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:53.7013323Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.7018691Z #22 576.3 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:33:53.7019904Z #22 576.3 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:33:53.7020950Z #22 576.3 ptxas info : Compile time = 1216.155 ms 2025-09-07T06:33:53.7980037Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:53.7990181Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.7996506Z #22 576.3 40 bytes stack frame, 148 bytes spill stores, 172 bytes spill loads 2025-09-07T06:33:53.7997740Z #22 576.3 ptxas info : Used 168 registers, used 9 barriers, 40 bytes cumulative stack size 2025-09-07T06:33:53.7998727Z #22 576.3 ptxas info : Compile time = 1309.413 ms 2025-09-07T06:33:53.8004348Z #22 576.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:33:53.8014494Z #22 576.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:33:53.8020299Z #22 576.3 56 bytes stack frame, 248 bytes spill stores, 292 bytes spill loads 2025-09-07T06:33:53.8021488Z #22 576.3 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:33:53.8022458Z #22 576.3 ptxas info : Compile time = 2388.888 ms 2025-09-07T06:34:09.6945430Z #22 592.4 [56/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_fp16_paged_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_fp16_paged_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_fp16_paged_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:34:09.6967379Z #22 592.4 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:34:09.6972940Z #22 592.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:09.6982363Z #22 592.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:09.6986587Z #22 592.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:09.6987347Z #22 592.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:09.6988004Z #22 592.4 ptxas info : Compile time = 1.688 ms 2025-09-07T06:34:09.6992410Z #22 592.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:09.6999413Z #22 592.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:09.7003223Z #22 592.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:09.7003978Z #22 592.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:09.7004626Z #22 592.4 ptxas info : Compile time = 0.890 ms 2025-09-07T06:34:09.7008759Z #22 592.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:09.7015920Z #22 592.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:09.7019713Z #22 592.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:09.7020452Z #22 592.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:09.7021105Z #22 592.4 ptxas info : Compile time = 0.785 ms 2025-09-07T06:34:09.7024995Z #22 592.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:09.7032250Z #22 592.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:09.7037587Z #22 592.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:09.7038683Z #22 592.4 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:34:09.7039616Z #22 592.4 ptxas info : Compile time = 0.578 ms 2025-09-07T06:34:09.7045808Z #22 592.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:09.7056699Z #22 592.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:09.7062229Z #22 592.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:09.7063284Z #22 592.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:09.7064186Z #22 592.4 ptxas info : Compile time = 0.535 ms 2025-09-07T06:34:09.7069958Z #22 592.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:09.7080661Z #22 592.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:09.7086509Z #22 592.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:09.7087627Z #22 592.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:09.7088392Z #22 592.4 ptxas info : Compile time = 0.527 ms 2025-09-07T06:34:09.7096543Z #22 592.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:09.7105757Z #22 592.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:09.7109483Z #22 592.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:09.7110233Z #22 592.4 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:34:09.7110896Z #22 592.4 ptxas info : Compile time = 0.506 ms 2025-09-07T06:34:09.7115010Z #22 592.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:09.7121987Z #22 592.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:09.7125878Z #22 592.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:09.7126647Z #22 592.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:09.7127315Z #22 592.4 ptxas info : Compile time = 0.566 ms 2025-09-07T06:34:09.7131402Z #22 592.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:09.7138542Z #22 592.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:09.7142351Z #22 592.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:09.7143099Z #22 592.4 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:09.7143960Z #22 592.4 ptxas info : Compile time = 0.552 ms 2025-09-07T06:34:09.7144445Z #22 592.4 ptxas info : 10 bytes gmem 2025-09-07T06:34:09.7147891Z #22 592.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:09.7155856Z #22 592.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:09.7160815Z #22 592.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:09.7161796Z #22 592.4 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:34:09.7162634Z #22 592.4 ptxas info : Compile time = 719.011 ms 2025-09-07T06:34:09.7168776Z #22 592.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:09.7179263Z #22 592.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:09.7184961Z #22 592.4 16 bytes stack frame, 40 bytes spill stores, 28 bytes spill loads 2025-09-07T06:34:09.7186163Z #22 592.4 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:34:09.7187188Z #22 592.4 ptxas info : Compile time = 862.194 ms 2025-09-07T06:34:09.7193314Z #22 592.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:09.7203863Z #22 592.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:09.7209682Z #22 592.4 48 bytes stack frame, 100 bytes spill stores, 132 bytes spill loads 2025-09-07T06:34:09.7210886Z #22 592.4 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:34:09.7212291Z #22 592.4 ptxas info : Compile time = 1894.070 ms 2025-09-07T06:34:09.7217985Z #22 592.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:09.7225051Z #22 592.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:09.7228739Z #22 592.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:09.7229433Z #22 592.4 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:34:09.7230009Z #22 592.4 ptxas info : Compile time = 1563.395 ms 2025-09-07T06:34:09.7234040Z #22 592.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:09.7241042Z #22 592.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:09.7244906Z #22 592.4 16 bytes stack frame, 64 bytes spill stores, 64 bytes spill loads 2025-09-07T06:34:09.7245728Z #22 592.4 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:34:09.7246605Z #22 592.4 ptxas info : Compile time = 1600.452 ms 2025-09-07T06:34:09.7250409Z #22 592.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:09.7257565Z #22 592.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:09.7261377Z #22 592.4 64 bytes stack frame, 128 bytes spill stores, 172 bytes spill loads 2025-09-07T06:34:09.7262237Z #22 592.4 ptxas info : Used 168 registers, used 16 barriers, 64 bytes cumulative stack size 2025-09-07T06:34:09.7263139Z #22 592.4 ptxas info : Compile time = 2159.575 ms 2025-09-07T06:34:09.7266851Z #22 592.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:09.7277001Z #22 592.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:09.7282798Z #22 592.4 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:09.7283795Z #22 592.4 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:34:09.7284626Z #22 592.4 ptxas info : Compile time = 826.024 ms 2025-09-07T06:34:09.7290666Z #22 592.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:09.7301355Z #22 592.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:09.7306978Z #22 592.4 16 bytes stack frame, 40 bytes spill stores, 28 bytes spill loads 2025-09-07T06:34:09.7308167Z #22 592.4 ptxas info : Used 168 registers, used 9 barriers, 16 bytes cumulative stack size 2025-09-07T06:34:09.7309535Z #22 592.4 ptxas info : Compile time = 793.176 ms 2025-09-07T06:34:09.7315300Z #22 592.4 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:09.7327774Z #22 592.4 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:09.7333632Z #22 592.4 48 bytes stack frame, 100 bytes spill stores, 128 bytes spill loads 2025-09-07T06:34:09.7335212Z #22 592.4 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:34:09.7336265Z #22 592.4 ptxas info : Compile time = 1603.638 ms 2025-09-07T06:34:31.0775490Z #22 613.7 [57/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:34:31.0794779Z #22 613.7 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:34:31.0800656Z #22 613.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:31.0810203Z #22 613.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:31.0815037Z #22 613.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:31.0816038Z #22 613.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:31.0816842Z #22 613.7 ptxas info : Compile time = 1.779 ms 2025-09-07T06:34:31.0821423Z #22 613.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:31.0831107Z #22 613.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:31.0836802Z #22 613.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:31.0837892Z #22 613.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:31.0838751Z #22 613.7 ptxas info : Compile time = 0.914 ms 2025-09-07T06:34:31.0844101Z #22 613.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:31.0854632Z #22 613.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:31.0859915Z #22 613.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:31.0860940Z #22 613.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:31.0861857Z #22 613.7 ptxas info : Compile time = 0.719 ms 2025-09-07T06:34:31.0866943Z #22 613.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:31.0876093Z #22 613.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:31.0880899Z #22 613.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:31.0882018Z #22 613.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:31.0882969Z #22 613.7 ptxas info : Compile time = 20.628 ms 2025-09-07T06:34:31.0888416Z #22 613.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:31.0902820Z #22 613.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:31.0908654Z #22 613.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:31.0909728Z #22 613.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:31.0910651Z #22 613.7 ptxas info : Compile time = 0.787 ms 2025-09-07T06:34:31.0916155Z #22 613.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:31.0932963Z #22 613.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:31.0938408Z #22 613.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:31.0939530Z #22 613.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:31.0940452Z #22 613.7 ptxas info : Compile time = 0.648 ms 2025-09-07T06:34:31.0945952Z #22 613.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:31.0955176Z #22 613.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:31.0960500Z #22 613.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:31.0961615Z #22 613.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:31.0962591Z #22 613.7 ptxas info : Compile time = 0.609 ms 2025-09-07T06:34:31.0968125Z #22 613.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:31.0978021Z #22 613.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:31.0983216Z #22 613.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:31.0984326Z #22 613.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:31.0985258Z #22 613.7 ptxas info : Compile time = 0.568 ms 2025-09-07T06:34:31.0990445Z #22 613.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:31.1001424Z #22 613.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:31.1007651Z #22 613.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:31.1008826Z #22 613.7 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:31.1009851Z #22 613.7 ptxas info : Compile time = 0.584 ms 2025-09-07T06:34:31.1010596Z #22 613.7 ptxas info : 10 bytes gmem 2025-09-07T06:34:31.1017740Z #22 613.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:31.1026725Z #22 613.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:31.1031802Z #22 613.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:31.1032796Z #22 613.7 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:34:31.1033635Z #22 613.7 ptxas info : Compile time = 829.616 ms 2025-09-07T06:34:31.1039267Z #22 613.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:31.1049536Z #22 613.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:31.1055456Z #22 613.7 72 bytes stack frame, 160 bytes spill stores, 200 bytes spill loads 2025-09-07T06:34:31.1056663Z #22 613.7 ptxas info : Used 168 registers, used 9 barriers, 72 bytes cumulative stack size 2025-09-07T06:34:31.1057704Z #22 613.7 ptxas info : Compile time = 1214.559 ms 2025-09-07T06:34:31.1063536Z #22 613.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:31.1074256Z #22 613.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:31.1079492Z #22 613.7 104 bytes stack frame, 348 bytes spill stores, 412 bytes spill loads 2025-09-07T06:34:31.1080824Z #22 613.7 ptxas info : Used 168 registers, used 16 barriers, 104 bytes cumulative stack size 2025-09-07T06:34:31.1081961Z #22 613.7 ptxas info : Compile time = 2246.956 ms 2025-09-07T06:34:31.1087485Z #22 613.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:31.1098250Z #22 613.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:31.1103914Z #22 613.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:31.1104945Z #22 613.7 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:34:31.1105759Z #22 613.7 ptxas info : Compile time = 1282.978 ms 2025-09-07T06:34:31.1110797Z #22 613.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:31.1121700Z #22 613.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:31.1127406Z #22 613.7 104 bytes stack frame, 216 bytes spill stores, 256 bytes spill loads 2025-09-07T06:34:31.1128690Z #22 613.7 ptxas info : Used 168 registers, used 9 barriers, 104 bytes cumulative stack size 2025-09-07T06:34:31.1129781Z #22 613.7 ptxas info : Compile time = 2018.298 ms 2025-09-07T06:34:31.1135672Z #22 613.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:31.1146583Z #22 613.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:31.1152192Z #22 613.7 96 bytes stack frame, 352 bytes spill stores, 412 bytes spill loads 2025-09-07T06:34:31.1153430Z #22 613.7 ptxas info : Used 168 registers, used 16 barriers, 96 bytes cumulative stack size 2025-09-07T06:34:31.1154494Z #22 613.7 ptxas info : Compile time = 3275.441 ms 2025-09-07T06:34:31.1159945Z #22 613.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:31.1168811Z #22 613.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:31.1173420Z #22 613.7 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:31.1174348Z #22 613.7 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:34:31.1175123Z #22 613.7 ptxas info : Compile time = 1133.730 ms 2025-09-07T06:34:31.1180306Z #22 613.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:31.1199108Z #22 613.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:31.1204536Z #22 613.7 72 bytes stack frame, 156 bytes spill stores, 188 bytes spill loads 2025-09-07T06:34:31.1205750Z #22 613.7 ptxas info : Used 168 registers, used 9 barriers, 72 bytes cumulative stack size 2025-09-07T06:34:31.1206785Z #22 613.7 ptxas info : Compile time = 1573.099 ms 2025-09-07T06:34:31.1211653Z #22 613.7 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:31.1221198Z #22 613.7 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:31.1226817Z #22 613.7 96 bytes stack frame, 344 bytes spill stores, 388 bytes spill loads 2025-09-07T06:34:31.1227798Z #22 613.7 ptxas info : Used 168 registers, used 16 barriers, 96 bytes cumulative stack size 2025-09-07T06:34:31.1228791Z #22 613.7 ptxas info : Compile time = 2765.183 ms 2025-09-07T06:34:33.3978809Z #22 616.1 [58/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_fp16_paged_split_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_fp16_paged_split_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_fp16_paged_split_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:34:33.3998123Z #22 616.1 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:34:33.4001824Z #22 616.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.4015979Z #22 616.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.4021671Z #22 616.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.4022875Z #22 616.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:33.4024096Z #22 616.1 ptxas info : Compile time = 1.646 ms 2025-09-07T06:34:33.4030211Z #22 616.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.4041650Z #22 616.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.4047731Z #22 616.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.4048894Z #22 616.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:33.4049615Z #22 616.1 ptxas info : Compile time = 0.845 ms 2025-09-07T06:34:33.4053637Z #22 616.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.4060958Z #22 616.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.4066281Z #22 616.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.4067660Z #22 616.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:33.4068625Z #22 616.1 ptxas info : Compile time = 0.793 ms 2025-09-07T06:34:33.4074151Z #22 616.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.4084545Z #22 616.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.4090254Z #22 616.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.4091425Z #22 616.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:33.4096713Z #22 616.1 ptxas info : Compile time = 0.551 ms 2025-09-07T06:34:33.4101531Z #22 616.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.4109404Z #22 616.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.4113991Z #22 616.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.4114961Z #22 616.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:33.4115969Z #22 616.1 ptxas info : Compile time = 0.541 ms 2025-09-07T06:34:33.4122005Z #22 616.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.4133262Z #22 616.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.4139596Z #22 616.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.4140754Z #22 616.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:33.4141976Z #22 616.1 ptxas info : Compile time = 0.524 ms 2025-09-07T06:34:33.4147684Z #22 616.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.4155903Z #22 616.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.4159511Z #22 616.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.4160276Z #22 616.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:33.4160940Z #22 616.1 ptxas info : Compile time = 0.511 ms 2025-09-07T06:34:33.4165340Z #22 616.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.4175945Z #22 616.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.4182085Z #22 616.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.4183224Z #22 616.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:33.4184235Z #22 616.1 ptxas info : Compile time = 0.502 ms 2025-09-07T06:34:33.4190195Z #22 616.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.4201609Z #22 616.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.4206335Z #22 616.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.4207103Z #22 616.1 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:33.4207984Z #22 616.1 ptxas info : Compile time = 0.532 ms 2025-09-07T06:34:33.4208470Z #22 616.1 ptxas info : 10 bytes gmem 2025-09-07T06:34:33.4212048Z #22 616.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.4219608Z #22 616.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.4225163Z #22 616.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.4226195Z #22 616.1 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:34:33.4226963Z #22 616.1 ptxas info : Compile time = 737.700 ms 2025-09-07T06:34:33.4233181Z #22 616.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.4244611Z #22 616.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.4250756Z #22 616.1 32 bytes stack frame, 56 bytes spill stores, 44 bytes spill loads 2025-09-07T06:34:33.4252039Z #22 616.1 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:34:33.4253374Z #22 616.1 ptxas info : Compile time = 1019.615 ms 2025-09-07T06:34:33.4258998Z #22 616.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.4266207Z #22 616.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.4270503Z #22 616.1 136 bytes stack frame, 236 bytes spill stores, 340 bytes spill loads 2025-09-07T06:34:33.4271531Z #22 616.1 ptxas info : Used 168 registers, used 16 barriers, 136 bytes cumulative stack size 2025-09-07T06:34:33.4272358Z #22 616.1 ptxas info : Compile time = 2400.597 ms 2025-09-07T06:34:33.4277658Z #22 616.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.4287817Z #22 616.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.4293809Z #22 616.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.4294882Z #22 616.1 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:34:33.4295999Z #22 616.1 ptxas info : Compile time = 1523.916 ms 2025-09-07T06:34:33.4302063Z #22 616.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.4312155Z #22 616.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.4316068Z #22 616.1 24 bytes stack frame, 52 bytes spill stores, 44 bytes spill loads 2025-09-07T06:34:33.4316922Z #22 616.1 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:34:33.4317701Z #22 616.1 ptxas info : Compile time = 1961.110 ms 2025-09-07T06:34:33.4321728Z #22 616.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.4331952Z #22 616.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.4338445Z #22 616.1 104 bytes stack frame, 152 bytes spill stores, 208 bytes spill loads 2025-09-07T06:34:33.4340015Z #22 616.1 ptxas info : Used 168 registers, used 16 barriers, 104 bytes cumulative stack size 2025-09-07T06:34:33.4341160Z #22 616.1 ptxas info : Compile time = 3614.108 ms 2025-09-07T06:34:33.4346798Z #22 616.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.4357014Z #22 616.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb1ELb0EEENS1_19SingleTileSchedulerILb0ELb1ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.4362652Z #22 616.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.4363519Z #22 616.1 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:34:33.4364344Z #22 616.1 ptxas info : Compile time = 1007.813 ms 2025-09-07T06:34:33.4368181Z #22 616.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.4375851Z #22 616.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.4380562Z #22 616.1 24 bytes stack frame, 52 bytes spill stores, 40 bytes spill loads 2025-09-07T06:34:33.4381904Z #22 616.1 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:34:33.4383071Z #22 616.1 ptxas info : Compile time = 1320.931 ms 2025-09-07T06:34:33.4389069Z #22 616.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.4401822Z #22 616.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.4408370Z #22 616.1 104 bytes stack frame, 156 bytes spill stores, 212 bytes spill loads 2025-09-07T06:34:33.4409909Z #22 616.1 ptxas info : Used 168 registers, used 16 barriers, 104 bytes cumulative stack size 2025-09-07T06:34:33.4411046Z #22 616.1 ptxas info : Compile time = 2819.390 ms 2025-09-07T06:34:33.6341303Z #22 616.3 [59/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:34:33.7908291Z #22 616.3 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:34:33.7916161Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.7926863Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.7930761Z #22 616.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.7931545Z #22 616.3 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:34:33.7932235Z #22 616.3 ptxas info : Compile time = 1.818 ms 2025-09-07T06:34:33.7936084Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.7943388Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.7948485Z #22 616.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.7949707Z #22 616.3 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:34:33.7950721Z #22 616.3 ptxas info : Compile time = 20.960 ms 2025-09-07T06:34:33.7956767Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.7968392Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.7974814Z #22 616.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.7975965Z #22 616.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:34:33.7976969Z #22 616.3 ptxas info : Compile time = 1.025 ms 2025-09-07T06:34:33.7983182Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.7991145Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.7995398Z #22 616.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.7996264Z #22 616.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:34:33.7996928Z #22 616.3 ptxas info : Compile time = 0.680 ms 2025-09-07T06:34:33.8000840Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.8011878Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.8017821Z #22 616.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.8019007Z #22 616.3 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:34:33.8020005Z #22 616.3 ptxas info : Compile time = 0.594 ms 2025-09-07T06:34:33.8026133Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.8037699Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.8044125Z #22 616.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.8045154Z #22 616.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:34:33.8046150Z #22 616.3 ptxas info : Compile time = 0.573 ms 2025-09-07T06:34:33.8051552Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.8058910Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.8062957Z #22 616.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.8063753Z #22 616.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:34:33.8064433Z #22 616.3 ptxas info : Compile time = 0.562 ms 2025-09-07T06:34:33.8068979Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.8080700Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.8087028Z #22 616.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.8088235Z #22 616.3 ptxas info : Used 4 registers, used 0 barriers, 2752 bytes cmem[0] 2025-09-07T06:34:33.8091628Z #22 616.3 ptxas info : Compile time = 0.613 ms 2025-09-07T06:34:33.8100261Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.8111604Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.8115598Z #22 616.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.8116369Z #22 616.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:34:33.8117028Z #22 616.3 ptxas info : Compile time = 0.557 ms 2025-09-07T06:34:33.8121162Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:33.8128795Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.8135038Z #22 616.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.8136275Z #22 616.3 ptxas info : Used 4 registers, used 0 barriers, 2496 bytes cmem[0] 2025-09-07T06:34:33.8137295Z #22 616.3 ptxas info : Compile time = 0.545 ms 2025-09-07T06:34:33.8138048Z #22 616.3 ptxas info : 10 bytes gmem 2025-09-07T06:34:33.8143811Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.8154430Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.8160029Z #22 616.3 8 bytes stack frame, 12 bytes spill stores, 12 bytes spill loads 2025-09-07T06:34:33.8161354Z #22 616.3 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:34:33.8162422Z #22 616.3 ptxas info : Compile time = 792.613 ms 2025-09-07T06:34:33.8168413Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.8177907Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb0ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.8181657Z #22 616.3 24 bytes stack frame, 68 bytes spill stores, 68 bytes spill loads 2025-09-07T06:34:33.8182505Z #22 616.3 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:34:33.8183265Z #22 616.3 ptxas info : Compile time = 779.569 ms 2025-09-07T06:34:33.8187426Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.8196167Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.8202533Z #22 616.3 40 bytes stack frame, 84 bytes spill stores, 100 bytes spill loads 2025-09-07T06:34:33.8203813Z #22 616.3 ptxas info : Used 168 registers, used 9 barriers, 40 bytes cumulative stack size 2025-09-07T06:34:33.8204927Z #22 616.3 ptxas info : Compile time = 956.649 ms 2025-09-07T06:34:33.8211367Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.8223324Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.8229524Z #22 616.3 56 bytes stack frame, 280 bytes spill stores, 304 bytes spill loads 2025-09-07T06:34:33.8230997Z #22 616.3 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:34:33.8232137Z #22 616.3 ptxas info : Compile time = 1814.494 ms 2025-09-07T06:34:33.8237129Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.8244097Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.8248083Z #22 616.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.8248803Z #22 616.3 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:34:33.8249393Z #22 616.3 ptxas info : Compile time = 1411.070 ms 2025-09-07T06:34:33.8254939Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.8266496Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.8272931Z #22 616.3 32 bytes stack frame, 104 bytes spill stores, 116 bytes spill loads 2025-09-07T06:34:33.8274260Z #22 616.3 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:34:33.8275545Z #22 616.3 ptxas info : Compile time = 1624.510 ms 2025-09-07T06:34:33.8281630Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.8293434Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.8298863Z #22 616.3 48 bytes stack frame, 228 bytes spill stores, 284 bytes spill loads 2025-09-07T06:34:33.8299752Z #22 616.3 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:34:33.8300670Z #22 616.3 ptxas info : Compile time = 2778.803 ms 2025-09-07T06:34:33.8304739Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.8311828Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb0ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.8316408Z #22 616.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:33.8317284Z #22 616.3 ptxas info : Used 168 registers, used 9 barriers 2025-09-07T06:34:33.8318211Z #22 616.3 ptxas info : Compile time = 1118.518 ms 2025-09-07T06:34:33.8324962Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.8336466Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.8342658Z #22 616.3 32 bytes stack frame, 64 bytes spill stores, 72 bytes spill loads 2025-09-07T06:34:33.8343985Z #22 616.3 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:34:33.8345506Z #22 616.3 ptxas info : Compile time = 1328.229 ms 2025-09-07T06:34:33.8351635Z #22 616.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:33.8361129Z #22 616.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb0ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:33.8365080Z #22 616.3 48 bytes stack frame, 312 bytes spill stores, 348 bytes spill loads 2025-09-07T06:34:33.8365960Z #22 616.3 ptxas info : Used 168 registers, used 16 barriers, 48 bytes cumulative stack size 2025-09-07T06:34:33.8366717Z #22 616.3 ptxas info : Compile time = 2288.441 ms 2025-09-07T06:34:38.6578702Z #22 621.3 [60/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_fp16_packgqa_sm90.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_fp16_packgqa_sm90.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_fp16_packgqa_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:34:38.8173467Z #22 621.3 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:34:38.8178616Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:38.8187386Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8194011Z #22 621.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:38.8195098Z #22 621.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:38.8196063Z #22 621.3 ptxas info : Compile time = 1.664 ms 2025-09-07T06:34:38.8200873Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:38.8210784Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8215758Z #22 621.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:38.8216668Z #22 621.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:38.8217452Z #22 621.3 ptxas info : Compile time = 0.808 ms 2025-09-07T06:34:38.8222514Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:38.8231968Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8236955Z #22 621.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:38.8237931Z #22 621.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:38.8238754Z #22 621.3 ptxas info : Compile time = 0.654 ms 2025-09-07T06:34:38.8243811Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:38.8254048Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8259578Z #22 621.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:38.8260634Z #22 621.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:38.8261535Z #22 621.3 ptxas info : Compile time = 0.776 ms 2025-09-07T06:34:38.8266857Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:38.8276862Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8282173Z #22 621.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:38.8283219Z #22 621.3 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:34:38.8284124Z #22 621.3 ptxas info : Compile time = 0.546 ms 2025-09-07T06:34:38.8289561Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:38.8299686Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8304865Z #22 621.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:38.8305978Z #22 621.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:38.8306913Z #22 621.3 ptxas info : Compile time = 0.534 ms 2025-09-07T06:34:38.8312459Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:38.8322804Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8328444Z #22 621.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:38.8329525Z #22 621.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:38.8330428Z #22 621.3 ptxas info : Compile time = 0.523 ms 2025-09-07T06:34:38.8335933Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:38.8344998Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8350120Z #22 621.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:38.8351181Z #22 621.3 ptxas info : Used 4 registers, used 0 barriers, 2624 bytes cmem[0] 2025-09-07T06:34:38.8352041Z #22 621.3 ptxas info : Compile time = 0.479 ms 2025-09-07T06:34:38.8357302Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:38.8367309Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8372489Z #22 621.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:38.8373633Z #22 621.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:38.8374441Z #22 621.3 ptxas info : Compile time = 0.524 ms 2025-09-07T06:34:38.8379499Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:38.8388750Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8394546Z #22 621.3 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:38.8395592Z #22 621.3 ptxas info : Used 4 registers, used 0 barriers, 2560 bytes cmem[0] 2025-09-07T06:34:38.8396527Z #22 621.3 ptxas info : Compile time = 0.511 ms 2025-09-07T06:34:38.8397198Z #22 621.3 ptxas info : 10 bytes gmem 2025-09-07T06:34:38.8402112Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:38.8411538Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8417998Z #22 621.3 24 bytes stack frame, 52 bytes spill stores, 56 bytes spill loads 2025-09-07T06:34:38.8419198Z #22 621.3 ptxas info : Used 168 registers, used 9 barriers, 24 bytes cumulative stack size 2025-09-07T06:34:38.8420254Z #22 621.3 ptxas info : Compile time = 841.347 ms 2025-09-07T06:34:38.8425333Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:38.8434851Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi2EEENS7_ILi1EEES9_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSB_SD_SC_EEESA_SF_SH_Li256ELb0ELb1ELb0ELb0EEENS1_29StaticPersistentTileSchedulerILb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8440023Z #22 621.3 32 bytes stack frame, 108 bytes spill stores, 112 bytes spill loads 2025-09-07T06:34:38.8441065Z #22 621.3 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:34:38.8441938Z #22 621.3 ptxas info : Compile time = 842.896 ms 2025-09-07T06:34:38.8446871Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:38.8457064Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8462786Z #22 621.3 56 bytes stack frame, 240 bytes spill stores, 252 bytes spill loads 2025-09-07T06:34:38.8463937Z #22 621.3 ptxas info : Used 168 registers, used 9 barriers, 56 bytes cumulative stack size 2025-09-07T06:34:38.8464957Z #22 621.3 ptxas info : Compile time = 1005.351 ms 2025-09-07T06:34:38.8470491Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:38.8480820Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8486378Z #22 621.3 64 bytes stack frame, 268 bytes spill stores, 308 bytes spill loads 2025-09-07T06:34:38.8487503Z #22 621.3 ptxas info : Used 168 registers, used 16 barriers, 64 bytes cumulative stack size 2025-09-07T06:34:38.8488521Z #22 621.3 ptxas info : Compile time = 1985.651 ms 2025-09-07T06:34:38.8493746Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:38.8503038Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8508031Z #22 621.3 32 bytes stack frame, 100 bytes spill stores, 88 bytes spill loads 2025-09-07T06:34:38.8509377Z #22 621.3 ptxas info : Used 168 registers, used 9 barriers, 32 bytes cumulative stack size 2025-09-07T06:34:38.8510492Z #22 621.3 ptxas info : Compile time = 1625.579 ms 2025-09-07T06:34:38.8515640Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:38.8525261Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8530584Z #22 621.3 48 bytes stack frame, 108 bytes spill stores, 140 bytes spill loads 2025-09-07T06:34:38.8531657Z #22 621.3 ptxas info : Used 168 registers, used 9 barriers, 48 bytes cumulative stack size 2025-09-07T06:34:38.8532775Z #22 621.3 ptxas info : Compile time = 1802.860 ms 2025-09-07T06:34:38.8537820Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:38.8547648Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi96EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8553085Z #22 621.3 64 bytes stack frame, 256 bytes spill stores, 312 bytes spill loads 2025-09-07T06:34:38.8554285Z #22 621.3 ptxas info : Used 168 registers, used 16 barriers, 64 bytes cumulative stack size 2025-09-07T06:34:38.8555335Z #22 621.3 ptxas info : Compile time = 3003.270 ms 2025-09-07T06:34:38.8560628Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:38.8570493Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb0ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8576033Z #22 621.3 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads 2025-09-07T06:34:38.8577203Z #22 621.3 ptxas info : Used 168 registers, used 9 barriers, 8 bytes cumulative stack size 2025-09-07T06:34:38.8578203Z #22 621.3 ptxas info : Compile time = 1300.356 ms 2025-09-07T06:34:38.8583630Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:38.8593547Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb0ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8598851Z #22 621.3 40 bytes stack frame, 148 bytes spill stores, 172 bytes spill loads 2025-09-07T06:34:38.8599976Z #22 621.3 ptxas info : Used 168 registers, used 9 barriers, 40 bytes cumulative stack size 2025-09-07T06:34:38.8600894Z #22 621.3 ptxas info : Compile time = 1404.669 ms 2025-09-07T06:34:38.8606461Z #22 621.3 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:38.8616989Z #22 621.3 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi192EEEEEELi192ENS_6half_tEfNS_4arch4Sm90ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJSA_SC_SB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:38.8622631Z #22 621.3 56 bytes stack frame, 248 bytes spill stores, 292 bytes spill loads 2025-09-07T06:34:38.8623870Z #22 621.3 ptxas info : Used 168 registers, used 16 barriers, 56 bytes cumulative stack size 2025-09-07T06:34:38.8624924Z #22 621.3 ptxas info : Compile time = 2530.685 ms 2025-09-07T06:34:48.3880621Z #22 631.1 [61/154] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_sm80.o.d -I/workspace/xformers/third_party/flash-attention/csrc/cutlass/include -I/workspace/xformers/third_party/flash-attention/hopper -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -I/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda/include -I/opt/_internal/cpython-3.12.11/include/python3.12 -c -c /workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_sm80.cu -o /workspace/xformers/build/temp.linux-x86_64-cpython-312/workspace/xformers/third_party/flash-attention/hopper/instantiations/flash_fwd_hdim192_bf16_paged_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DHAS_PYTORCH --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE -std=c++17 --generate-line-info -DNDEBUG --threads 4 --ptxas-options=-v -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -DCUTE_SM90_EXTENDED_MMA_SHAPES_ENABLED -DCUTLASS_ENABLE_GDC_FOR_SM90 -D_USE_MATH_DEFINES -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_90a,code=sm_90a -DFLASHATTENTION_DISABLE_SOFTCAP -DFLASHATTENTION_DISABLE_FP8 --generate-line-info -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -DTORCH_EXTENSION_NAME=_C 2025-09-07T06:34:48.5431967Z #22 631.1 ptxas info : 10 bytes gmem 2025-09-07T06:34:48.5438182Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:48.5447910Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5452696Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5453765Z #22 631.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:48.5455037Z #22 631.1 ptxas info : Compile time = 1.924 ms 2025-09-07T06:34:48.5460442Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:48.5469546Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5474839Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5475737Z #22 631.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:48.5476460Z #22 631.1 ptxas info : Compile time = 0.925 ms 2025-09-07T06:34:48.5481109Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:48.5489830Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5495182Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5496196Z #22 631.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:48.5496938Z #22 631.1 ptxas info : Compile time = 0.628 ms 2025-09-07T06:34:48.5501808Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:48.5511168Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5516237Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5517223Z #22 631.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:48.5518007Z #22 631.1 ptxas info : Compile time = 0.614 ms 2025-09-07T06:34:48.5523193Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:48.5544756Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5549877Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5550824Z #22 631.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:48.5551590Z #22 631.1 ptxas info : Compile time = 0.836 ms 2025-09-07T06:34:48.5556905Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:48.5565919Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5570785Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5571732Z #22 631.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:48.5572733Z #22 631.1 ptxas info : Compile time = 0.591 ms 2025-09-07T06:34:48.5578364Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:48.5587878Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5593237Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5594183Z #22 631.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:48.5594964Z #22 631.1 ptxas info : Compile time = 0.701 ms 2025-09-07T06:34:48.5600431Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:48.5608972Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5614079Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5615042Z #22 631.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:48.5615833Z #22 631.1 ptxas info : Compile time = 0.674 ms 2025-09-07T06:34:48.5620861Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:48.5629067Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5633871Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5634780Z #22 631.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:48.5635555Z #22 631.1 ptxas info : Compile time = 0.661 ms 2025-09-07T06:34:48.5640716Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:48.5650215Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5655272Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5656206Z #22 631.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:48.5656923Z #22 631.1 ptxas info : Compile time = 0.661 ms 2025-09-07T06:34:48.5661607Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:48.5670480Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5675577Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5676522Z #22 631.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:48.5677353Z #22 631.1 ptxas info : Compile time = 0.651 ms 2025-09-07T06:34:48.5682604Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:48.5693957Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5698717Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5699638Z #22 631.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:48.5700366Z #22 631.1 ptxas info : Compile time = 0.678 ms 2025-09-07T06:34:48.5705044Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:48.5713780Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5718850Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5719816Z #22 631.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:48.5720549Z #22 631.1 ptxas info : Compile time = 0.646 ms 2025-09-07T06:34:48.5725169Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:48.5734708Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5739924Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5740878Z #22 631.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:48.5741678Z #22 631.1 ptxas info : Compile time = 0.688 ms 2025-09-07T06:34:48.5746590Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:48.5755630Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5760755Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5761671Z #22 631.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:48.5762476Z #22 631.1 ptxas info : Compile time = 0.681 ms 2025-09-07T06:34:48.5767571Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:48.5777150Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5782543Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5783478Z #22 631.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:48.5784236Z #22 631.1 ptxas info : Compile time = 0.683 ms 2025-09-07T06:34:48.5788966Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:48.5798208Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5803138Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5804060Z #22 631.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:48.5805060Z #22 631.1 ptxas info : Compile time = 0.643 ms 2025-09-07T06:34:48.5810264Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_90a' 2025-09-07T06:34:48.5819429Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5824510Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5825475Z #22 631.1 ptxas info : Used 4 registers, used 0 barriers 2025-09-07T06:34:48.5826249Z #22 631.1 ptxas info : Compile time = 0.661 ms 2025-09-07T06:34:48.5827020Z #22 631.1 ptxas info : 10 bytes gmem, 80 bytes cmem[4] 2025-09-07T06:34:48.5832229Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:48.5841155Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5846248Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5847297Z #22 631.1 ptxas info : Used 238 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:48.5848403Z #22 631.1 ptxas info : Compile time = 736.615 ms 2025-09-07T06:34:48.5853562Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:48.5862332Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5867263Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5868356Z #22 631.1 ptxas info : Used 230 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:48.5869258Z #22 631.1 ptxas info : Compile time = 825.475 ms 2025-09-07T06:34:48.5874267Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:48.5883230Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5888306Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5889389Z #22 631.1 ptxas info : Used 234 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:48.5890286Z #22 631.1 ptxas info : Compile time = 1535.186 ms 2025-09-07T06:34:48.5895937Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:48.5904963Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5910058Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5911108Z #22 631.1 ptxas info : Used 237 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:48.5912019Z #22 631.1 ptxas info : Compile time = 1684.599 ms 2025-09-07T06:34:48.5917002Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:48.5925830Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5930669Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5931669Z #22 631.1 ptxas info : Used 237 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:48.5932411Z #22 631.1 ptxas info : Compile time = 1885.932 ms 2025-09-07T06:34:48.5937561Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:48.5946507Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5951417Z #22 631.1 112 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5952962Z #22 631.1 ptxas info : Used 250 registers, used 2 barriers, 112 bytes cumulative stack size, 1400 bytes cmem[0] 2025-09-07T06:34:48.5954076Z #22 631.1 ptxas info : Compile time = 3851.989 ms 2025-09-07T06:34:48.5960947Z #22 631.1 ptxas info : Function properties for _ZZN5flash25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS1_1CILi128EEENS3_ILi64EEENS3_ILi192EEEEEELi192EN7cutlass10bfloat16_tEfNS8_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb0EE3mmaINS_16FlashAttnFwdSm80ISC_NS_21CollectiveEpilogueFwdINS2_IJS4_S6_S5_EEENS2_IJNS3_ILi1EEESH_SH_EEES9_SB_Li256ELb1ELb1ELb0ELb0EEENS_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm96EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESS_EEESH_NS3_ILi24EEEEEENS2_IJNS2_IJSH_SS_EEENS3_ILi0EEENS3_ILi4EEEEEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSC_6ParamsERT0_RT1_iRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUlvE_clEv 2025-09-07T06:34:48.5968132Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5973398Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:48.5982752Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_30DynamicPersistentTileSchedulerILi256ELi256ELb0ELb1ELb0EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.5988014Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.5989041Z #22 631.1 ptxas info : Used 251 registers, used 6 barriers, 1416 bytes cmem[0] 2025-09-07T06:34:48.5989978Z #22 631.1 ptxas info : Compile time = 1381.077 ms 2025-09-07T06:34:48.5995141Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:48.6004570Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.6009354Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.6010360Z #22 631.1 ptxas info : Used 241 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:48.6011241Z #22 631.1 ptxas info : Compile time = 1216.241 ms 2025-09-07T06:34:48.6016346Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:48.6025868Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi1ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb1ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.6030645Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.6031685Z #22 631.1 ptxas info : Used 239 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:48.6032579Z #22 631.1 ptxas info : Compile time = 2398.193 ms 2025-09-07T06:34:48.6037306Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:48.6046477Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.6051717Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.6053008Z #22 631.1 ptxas info : Used 244 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:48.6053944Z #22 631.1 ptxas info : Compile time = 696.437 ms 2025-09-07T06:34:48.6058995Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:48.6067982Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.6072762Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.6073788Z #22 631.1 ptxas info : Used 246 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:48.6074687Z #22 631.1 ptxas info : Compile time = 808.098 ms 2025-09-07T06:34:48.6079635Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:48.6088496Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb0ELb0ELb1ELb1ELb1ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb1ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.6093895Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.6094945Z #22 631.1 ptxas info : Used 223 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:48.6095881Z #22 631.1 ptxas info : Compile time = 1640.171 ms 2025-09-07T06:34:48.6100871Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE' for 'sm_80' 2025-09-07T06:34:48.6109808Z #22 631.1 ptxas info : Function properties for _ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb0ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb0ELb1ELb0ELb0EEENS1_19SingleTileSchedulerILb0ELb0ELb1ELi128EEEEEEEEEvNT_6ParamsE 2025-09-07T06:34:48.6114895Z #22 631.1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads 2025-09-07T06:34:48.6115956Z #22 631.1 ptxas info : Used 236 registers, used 2 barriers, 1400 bytes cmem[0] 2025-09-07T06:34:48.6116836Z #22 631.1 ptxas info : Compile time = 1799.138 ms 2025-09-07T06:34:48.6120828Z #22 631.1 ptxas info : Compiling entry function '_ZN7cutlass13device_kernelIN5flash19enable_sm80_to_sm89INS1_16FlashAttnFwdSm80INS1_25CollectiveMainloopFwdSm80ILi8ELi2ELb0EN4cute5tupleIJNS5_1CILi128EEENS7_ILi64EEENS7_ILi192EEEEEELi192ENS_10bfloat16_tEfNS_4arch4Sm80ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb0EEENS1_21CollectiveEpilogueFwdINS6_IJS8_SA_S9_EEENS6_IJNS7_ILi1EEESI_SI_EEESC_SE_Li256ELb1ELb1ELb0ELb0EEENS1_19SingleTileSche 2025-09-07T06:34:48.6125092Z #22 631.1 [output clipped, log limit 2MiB reached] 2025-09-07T06:52:21.0401697Z #22 1683.7 /opt/python/cp312-cp312/lib/python3.12/site-packages/setuptools/_distutils/cmd.py:90: SetuptoolsDeprecationWarning: setup.py install is deprecated. 2025-09-07T06:52:21.0402714Z #22 1683.7 !! 2025-09-07T06:52:21.0402954Z #22 1683.7 2025-09-07T06:52:21.0403262Z #22 1683.7 ******************************************************************************** 2025-09-07T06:52:21.0403727Z #22 1683.7 Please avoid running ``setup.py`` directly. 2025-09-07T06:52:21.0404257Z #22 1683.7 Instead, use pypa/build, pypa/installer or other 2025-09-07T06:52:21.0405085Z #22 1683.7 standards-based tools. 2025-09-07T06:52:21.0405414Z #22 1683.7 2025-09-07T06:52:21.0405908Z #22 1683.7 See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details. 2025-09-07T06:52:21.0406526Z #22 1683.7 ******************************************************************************** 2025-09-07T06:52:21.0406904Z #22 1683.7 2025-09-07T06:52:21.0407112Z #22 1683.7 !! 2025-09-07T06:52:21.0407371Z #22 1683.7 self.initialize_options() 2025-09-07T06:52:21.9552557Z #22 1684.6 warning: no files found matching 'third_party/flash-attention/version.txt' 2025-09-07T06:53:15.7421890Z #22 DONE 1738.4s 2025-09-07T06:53:15.8956534Z 2025-09-07T06:53:15.8958000Z #23 [base 17/20] RUN --mount=type=cache,target=/root/.cache/uv uv pip install --system xformers-dist/*.whl --verbose 2025-09-07T06:53:16.2224595Z #23 0.478 DEBUG uv 0.8.4 2025-09-07T06:53:16.4215563Z #23 0.481 DEBUG Searching for default Python interpreter in managed installations or search path 2025-09-07T06:53:16.4216489Z #23 0.481 DEBUG Searching for managed installations at `/root/.local/share/uv/python` 2025-09-07T06:53:16.4217479Z #23 0.484 DEBUG Found `cpython-3.12.11-linux-x86_64-gnu` at `/opt/python/cp312-cp312/bin/python` (first executable in the search path) 2025-09-07T06:53:16.4218346Z #23 0.484 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T06:53:16.4218904Z #23 0.484 DEBUG Acquired lock for `/opt/python/cp312-cp312` 2025-09-07T06:53:16.4219837Z #23 0.489 DEBUG At least one requirement is not satisfied: file:///workspace/xformers-dist/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl 2025-09-07T06:53:16.4220715Z #23 0.489 DEBUG Using request timeout of 500s 2025-09-07T06:53:16.4221184Z #23 0.497 DEBUG Solving with installed Python version: 3.12.11 2025-09-07T06:53:16.4221691Z #23 0.497 DEBUG Solving with target Python version: >=3.12.11 2025-09-07T06:53:16.4222175Z #23 0.497 DEBUG Adding direct dependency: xformers* 2025-09-07T06:53:16.4223113Z #23 0.497 DEBUG Searching for a compatible version of xformers @ file:///workspace/xformers-dist/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl (*) 2025-09-07T06:53:16.4224357Z #23 0.497 DEBUG Adding transitive dependency for xformers==0.0.33+5d4b92a5.d20250907: torch>=2.8 2025-09-07T06:53:16.4225258Z #23 0.497 DEBUG Adding transitive dependency for xformers==0.0.33+5d4b92a5.d20250907: numpy* 2025-09-07T06:53:16.4225950Z #23 0.498 DEBUG Found stale response for: https://pypi.org/simple/torch/ 2025-09-07T06:53:16.4226597Z #23 0.498 DEBUG Sending revalidation request for: https://pypi.org/simple/torch/ 2025-09-07T06:53:16.4227669Z #23 0.500 DEBUG Found stale response for: https://pypi.org/simple/numpy/ 2025-09-07T06:53:16.4228318Z #23 0.500 DEBUG Sending revalidation request for: https://pypi.org/simple/numpy/ 2025-09-07T06:53:16.4228998Z #23 0.506 DEBUG Found not-modified response for: https://pypi.org/simple/numpy/ 2025-09-07T06:53:16.4229658Z #23 0.509 DEBUG Found not-modified response for: https://pypi.org/simple/torch/ 2025-09-07T06:53:16.4230759Z #23 0.510 DEBUG Found installed version of torch==2.9.0.dev20250906+cu128 (from file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) that satisfies >=2.8 2025-09-07T06:53:16.4231781Z #23 0.510 DEBUG Searching for a compatible version of torch (>=2.8) 2025-09-07T06:53:16.4232812Z #23 0.510 DEBUG Found installed version of torch==2.9.0.dev20250906+cu128 (from file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) that satisfies >=2.8 2025-09-07T06:53:16.4233883Z #23 0.510 DEBUG Selecting: torch==2.9.0.dev20250906+cu128 [installed] (installed) 2025-09-07T06:53:16.4234557Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: filelock* 2025-09-07T06:53:16.4235359Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: typing-extensions>=4.10.0 2025-09-07T06:53:16.4236357Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: setuptools{python_full_version >= '3.12'}* 2025-09-07T06:53:16.4237250Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: sympy>=1.13.3 2025-09-07T06:53:16.4238019Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: networkx>=2.5.1 2025-09-07T06:53:16.4238753Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: jinja2* 2025-09-07T06:53:16.4239490Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: fsspec>=0.8.5 2025-09-07T06:53:16.4240605Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.93, <12.8.93+ 2025-09-07T06:53:16.4242098Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.90, <12.8.90+ 2025-09-07T06:53:16.4243635Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.90, <12.8.90+ 2025-09-07T06:53:16.4245082Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=9.10.2.21, <9.10.2.21+ 2025-09-07T06:53:16.4246530Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.4.1, <12.8.4.1+ 2025-09-07T06:53:16.4247985Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=11.3.3.83, <11.3.3.83+ 2025-09-07T06:53:16.4249421Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=10.3.9.90, <10.3.9.90+ 2025-09-07T06:53:16.4250891Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=11.7.3.90, <11.7.3.90+ 2025-09-07T06:53:16.4252650Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.5.8.93, <12.5.8.93+ 2025-09-07T06:53:16.4254204Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=0.7.1, <0.7.1+ 2025-09-07T06:53:16.4255717Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=2.27.5, <2.27.5+ 2025-09-07T06:53:16.4257175Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=3.3.20, <3.3.20+ 2025-09-07T06:53:16.4258629Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.90, <12.8.90+ 2025-09-07T06:53:16.4260111Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.93, <12.8.93+ 2025-09-07T06:53:16.4261614Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=1.13.1.3, <1.13.1.3+ 2025-09-07T06:53:16.4263064Z #23 0.510 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: pytorch-triton{platform_machine == 'x86_64' and sys_platform == 'linux'}==3.4.0+gitf7888497 2025-09-07T06:53:16.4264089Z #23 0.511 DEBUG Found installed version of numpy==2.2.6 that satisfies * 2025-09-07T06:53:16.4264815Z #23 0.513 DEBUG Found stale response for: https://pypi.org/simple/filelock/ 2025-09-07T06:53:16.4265526Z #23 0.513 DEBUG Sending revalidation request for: https://pypi.org/simple/filelock/ 2025-09-07T06:53:16.4266247Z #23 0.513 DEBUG Found stale response for: https://pypi.org/simple/typing-extensions/ 2025-09-07T06:53:16.4266994Z #23 0.513 DEBUG Sending revalidation request for: https://pypi.org/simple/typing-extensions/ 2025-09-07T06:53:16.4267714Z #23 0.513 DEBUG Found stale response for: https://pypi.org/simple/setuptools/ 2025-09-07T06:53:16.4268392Z #23 0.513 DEBUG Sending revalidation request for: https://pypi.org/simple/setuptools/ 2025-09-07T06:53:16.4269066Z #23 0.513 DEBUG Found stale response for: https://pypi.org/simple/sympy/ 2025-09-07T06:53:16.4269708Z #23 0.513 DEBUG Sending revalidation request for: https://pypi.org/simple/sympy/ 2025-09-07T06:53:16.4270347Z #23 0.513 DEBUG Found stale response for: https://pypi.org/simple/networkx/ 2025-09-07T06:53:16.4271015Z #23 0.513 DEBUG Sending revalidation request for: https://pypi.org/simple/networkx/ 2025-09-07T06:53:16.4271703Z #23 0.513 DEBUG Found stale response for: https://pypi.org/simple/jinja2/ 2025-09-07T06:53:16.4272355Z #23 0.513 DEBUG Sending revalidation request for: https://pypi.org/simple/jinja2/ 2025-09-07T06:53:16.4272987Z #23 0.513 DEBUG Found stale response for: https://pypi.org/simple/fsspec/ 2025-09-07T06:53:16.4273632Z #23 0.513 DEBUG Sending revalidation request for: https://pypi.org/simple/fsspec/ 2025-09-07T06:53:16.4274353Z #23 0.513 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T06:53:16.4275132Z #23 0.513 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T06:53:16.4275937Z #23 0.514 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T06:53:16.4276742Z #23 0.514 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T06:53:16.4277551Z #23 0.514 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T06:53:16.4278346Z #23 0.514 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T06:53:16.4279104Z #23 0.514 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T06:53:16.4279892Z #23 0.514 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T06:53:16.4280634Z #23 0.514 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T06:53:16.4281392Z #23 0.514 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T06:53:16.4282166Z #23 0.514 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T06:53:16.4282914Z #23 0.514 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T06:53:16.4283669Z #23 0.514 DEBUG Found stale response for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T06:53:16.4284411Z #23 0.514 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T06:53:16.4285173Z #23 0.514 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T06:53:16.4285936Z #23 0.514 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T06:53:16.4286713Z #23 0.514 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T06:53:16.4287488Z #23 0.514 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T06:53:16.4288262Z #23 0.514 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T06:53:16.4289058Z #23 0.514 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T06:53:16.4289808Z #23 0.514 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T06:53:16.4290547Z #23 0.514 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T06:53:16.4291342Z #23 0.514 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T06:53:16.4292349Z #23 0.514 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T06:53:16.4293301Z #23 0.514 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T06:53:16.4294051Z #23 0.514 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T06:53:16.4294838Z #23 0.514 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T06:53:16.4295634Z #23 0.514 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T06:53:16.4296429Z #23 0.514 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T06:53:16.4297207Z #23 0.514 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T06:53:16.4298032Z #23 0.514 DEBUG Found stale response for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T06:53:16.4298783Z #23 0.514 DEBUG Sending revalidation request for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T06:53:16.4299526Z #23 0.515 DEBUG Found not-modified response for: https://pypi.org/simple/filelock/ 2025-09-07T06:53:16.4300434Z #23 0.515 DEBUG Found installed version of filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) that satisfies * 2025-09-07T06:53:16.4301338Z #23 0.515 DEBUG Found not-modified response for: https://pypi.org/simple/networkx/ 2025-09-07T06:53:16.4302046Z #23 0.515 DEBUG Found not-modified response for: https://pypi.org/simple/sympy/ 2025-09-07T06:53:16.4302796Z #23 0.515 DEBUG Found not-modified response for: https://pypi.org/simple/typing-extensions/ 2025-09-07T06:53:16.4303559Z #23 0.516 DEBUG Found not-modified response for: https://pypi.org/simple/setuptools/ 2025-09-07T06:53:16.4304481Z #23 0.516 DEBUG Found installed version of networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) that satisfies >=2.5.1 2025-09-07T06:53:16.4305648Z #23 0.516 DEBUG Found installed version of sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) that satisfies >=1.13.3 2025-09-07T06:53:16.4306794Z #23 0.516 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.10.0 2025-09-07T06:53:16.4307827Z #23 0.518 DEBUG Found not-modified response for: https://pypi.org/simple/jinja2/ 2025-09-07T06:53:16.4308567Z #23 0.518 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T06:53:16.4309407Z #23 0.518 DEBUG Found not-modified response for: https://pypi.org/simple/fsspec/ 2025-09-07T06:53:16.4310146Z #23 0.518 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T06:53:16.4310912Z #23 0.518 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T06:53:16.4311716Z #23 0.518 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T06:53:16.4312499Z #23 0.518 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T06:53:16.4313310Z #23 0.518 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T06:53:16.4314127Z #23 0.518 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T06:53:16.4314929Z #23 0.518 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T06:53:16.4315733Z #23 0.518 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T06:53:16.4316516Z #23 0.519 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T06:53:16.4317323Z #23 0.519 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T06:53:16.4318154Z #23 0.519 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T06:53:16.4318941Z #23 0.519 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T06:53:16.4319703Z #23 0.519 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T06:53:16.4320545Z #23 0.519 DEBUG Found not-modified response for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T06:53:16.4321330Z #23 0.519 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T06:53:16.4322357Z #23 0.519 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.93, <12.8.93+) 2025-09-07T06:53:16.4323890Z #23 0.519 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies >=12.8.93, <12.8.93+ 2025-09-07T06:53:16.4325148Z #23 0.519 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.8.93 [installed] (installed) 2025-09-07T06:53:16.4325960Z #23 0.519 DEBUG Adding transitive dependency for nvidia-cuda-nvrtc-cu12==12.8.93: nvidia-cuda-nvrtc-cu12==12.8.93 2025-09-07T06:53:16.4327159Z #23 0.519 DEBUG Adding transitive dependency for nvidia-cuda-nvrtc-cu12==12.8.93: nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.93 2025-09-07T06:53:16.4328244Z #23 0.519 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12 (==12.8.93) 2025-09-07T06:53:16.4329448Z #23 0.519 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T06:53:16.4330630Z #23 0.519 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.8.93 [installed] (installed) 2025-09-07T06:53:16.4331451Z #23 0.520 DEBUG Found installed version of jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) that satisfies * 2025-09-07T06:53:16.4332751Z #23 0.520 DEBUG Found installed version of fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) that satisfies >=0.8.5 2025-09-07T06:53:16.4334184Z #23 0.520 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T06:53:16.4335720Z #23 0.520 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.93) 2025-09-07T06:53:16.4337222Z #23 0.520 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T06:53:16.4338464Z #23 0.520 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.8.93 [installed] (installed) 2025-09-07T06:53:16.4339475Z #23 0.520 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.90, <12.8.90+) 2025-09-07T06:53:16.4341078Z #23 0.520 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.8.90, <12.8.90+ 2025-09-07T06:53:16.4342372Z #23 0.520 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.8.90 [installed] (installed) 2025-09-07T06:53:16.4343245Z #23 0.520 DEBUG Adding transitive dependency for nvidia-cuda-runtime-cu12==12.8.90: nvidia-cuda-runtime-cu12==12.8.90 2025-09-07T06:53:16.4344649Z #23 0.520 DEBUG Adding transitive dependency for nvidia-cuda-runtime-cu12==12.8.90: nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.90 2025-09-07T06:53:16.4345826Z #23 0.520 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12 (==12.8.90) 2025-09-07T06:53:16.4347073Z #23 0.520 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T06:53:16.4348253Z #23 0.520 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.8.90 [installed] (installed) 2025-09-07T06:53:16.4349422Z #23 0.520 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T06:53:16.4350852Z #23 0.520 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.90) 2025-09-07T06:53:16.4352297Z #23 0.520 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T06:53:16.4353458Z #23 0.520 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.8.90 [installed] (installed) 2025-09-07T06:53:16.4354448Z #23 0.520 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.90, <12.8.90+) 2025-09-07T06:53:16.4355930Z #23 0.520 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.8.90, <12.8.90+ 2025-09-07T06:53:16.4357104Z #23 0.520 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.8.90 [installed] (installed) 2025-09-07T06:53:16.4357901Z #23 0.520 DEBUG Adding transitive dependency for nvidia-cuda-cupti-cu12==12.8.90: nvidia-cuda-cupti-cu12==12.8.90 2025-09-07T06:53:16.4359067Z #23 0.520 DEBUG Adding transitive dependency for nvidia-cuda-cupti-cu12==12.8.90: nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.90 2025-09-07T06:53:16.4360112Z #23 0.520 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12 (==12.8.90) 2025-09-07T06:53:16.4361292Z #23 0.520 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T06:53:16.4362421Z #23 0.520 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.8.90 [installed] (installed) 2025-09-07T06:53:16.4363596Z #23 0.520 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T06:53:16.4365002Z #23 0.520 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.90) 2025-09-07T06:53:16.4366419Z #23 0.520 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T06:53:16.4367562Z #23 0.520 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.8.90 [installed] (installed) 2025-09-07T06:53:16.4368478Z #23 0.520 DEBUG Searching for a compatible version of nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=9.10.2.21, <9.10.2.21+) 2025-09-07T06:53:16.4369825Z #23 0.520 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=9.10.2.21, <9.10.2.21+ 2025-09-07T06:53:16.4370874Z #23 0.520 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T06:53:16.4371610Z #23 0.520 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cudnn-cu12==9.10.2.21 2025-09-07T06:53:16.4372987Z #23 0.520 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==9.10.2.21 2025-09-07T06:53:16.4374039Z #23 0.520 DEBUG Searching for a compatible version of nvidia-cudnn-cu12 (==9.10.2.21) 2025-09-07T06:53:16.4375171Z #23 0.520 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T06:53:16.4376655Z #23 0.520 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T06:53:16.4377715Z #23 0.520 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T06:53:16.4378479Z #23 0.520 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cublas-cu12* 2025-09-07T06:53:16.4379513Z #23 0.520 DEBUG Searching for a compatible version of nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==9.10.2.21) 2025-09-07T06:53:16.4380802Z #23 0.520 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies * 2025-09-07T06:53:16.4382281Z #23 0.520 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T06:53:16.4383349Z #23 0.520 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T06:53:16.4384094Z #23 0.520 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cublas-cu12* 2025-09-07T06:53:16.4385333Z #23 0.520 DEBUG Searching for a compatible version of nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.4.1, <12.8.4.1+) 2025-09-07T06:53:16.4386672Z #23 0.520 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=12.8.4.1, <12.8.4.1+ 2025-09-07T06:53:16.4387723Z #23 0.520 DEBUG Selecting: nvidia-cublas-cu12==12.8.4.1 [installed] (installed) 2025-09-07T06:53:16.4388475Z #23 0.520 DEBUG Adding transitive dependency for nvidia-cublas-cu12==12.8.4.1: nvidia-cublas-cu12==12.8.4.1 2025-09-07T06:53:16.4389564Z #23 0.520 DEBUG Adding transitive dependency for nvidia-cublas-cu12==12.8.4.1: nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.4.1 2025-09-07T06:53:16.4390586Z #23 0.520 DEBUG Searching for a compatible version of nvidia-cublas-cu12 (==12.8.4.1) 2025-09-07T06:53:16.4391610Z #23 0.520 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.8.4.1 2025-09-07T06:53:16.4393084Z #23 0.520 DEBUG Selecting: nvidia-cublas-cu12==12.8.4.1 [installed] (installed) 2025-09-07T06:53:16.4394173Z #23 0.520 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.8.4.1 2025-09-07T06:53:16.4395518Z #23 0.520 DEBUG Searching for a compatible version of nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.4.1) 2025-09-07T06:53:16.4396867Z #23 0.520 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.8.4.1 2025-09-07T06:53:16.4397941Z #23 0.520 DEBUG Selecting: nvidia-cublas-cu12==12.8.4.1 [installed] (installed) 2025-09-07T06:53:16.4398904Z #23 0.520 DEBUG Searching for a compatible version of nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=11.3.3.83, <11.3.3.83+) 2025-09-07T06:53:16.4400431Z #23 0.520 DEBUG Found installed version of nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=11.3.3.83, <11.3.3.83+ 2025-09-07T06:53:16.4401640Z #23 0.520 DEBUG Selecting: nvidia-cufft-cu12==11.3.3.83 [installed] (installed) 2025-09-07T06:53:16.4402485Z #23 0.521 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.3.3.83: nvidia-cufft-cu12==11.3.3.83 2025-09-07T06:53:16.4403650Z #23 0.521 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.3.3.83: nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==11.3.3.83 2025-09-07T06:53:16.4404890Z #23 0.521 DEBUG Searching for a compatible version of nvidia-cufft-cu12 (==11.3.3.83) 2025-09-07T06:53:16.4406022Z #23 0.521 DEBUG Found installed version of nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.3.3.83 2025-09-07T06:53:16.4407609Z #23 0.521 DEBUG Found installed version of nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.3.3.83 2025-09-07T06:53:16.4408700Z #23 0.521 DEBUG Selecting: nvidia-cufft-cu12==11.3.3.83 [installed] (installed) 2025-09-07T06:53:16.4409480Z #23 0.521 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.3.3.83: nvidia-nvjitlink-cu12* 2025-09-07T06:53:16.4410461Z #23 0.521 DEBUG Searching for a compatible version of nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==11.3.3.83) 2025-09-07T06:53:16.4411821Z #23 0.521 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies * 2025-09-07T06:53:16.4413728Z #23 0.521 DEBUG Found installed version of nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.3.3.83 2025-09-07T06:53:16.4414893Z #23 0.521 DEBUG Selecting: nvidia-cufft-cu12==11.3.3.83 [installed] (installed) 2025-09-07T06:53:16.4415673Z #23 0.521 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.3.3.83: nvidia-nvjitlink-cu12* 2025-09-07T06:53:16.4416772Z #23 0.521 DEBUG Searching for a compatible version of nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=10.3.9.90, <10.3.9.90+) 2025-09-07T06:53:16.4418209Z #23 0.521 DEBUG Found installed version of nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=10.3.9.90, <10.3.9.90+ 2025-09-07T06:53:16.4419398Z #23 0.521 DEBUG Selecting: nvidia-curand-cu12==10.3.9.90 [installed] (installed) 2025-09-07T06:53:16.4420202Z #23 0.521 DEBUG Adding transitive dependency for nvidia-curand-cu12==10.3.9.90: nvidia-curand-cu12==10.3.9.90 2025-09-07T06:53:16.4421393Z #23 0.521 DEBUG Adding transitive dependency for nvidia-curand-cu12==10.3.9.90: nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==10.3.9.90 2025-09-07T06:53:16.4422499Z #23 0.521 DEBUG Searching for a compatible version of nvidia-curand-cu12 (==10.3.9.90) 2025-09-07T06:53:16.4423614Z #23 0.521 DEBUG Found installed version of nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.9.90 2025-09-07T06:53:16.4424814Z #23 0.521 DEBUG Selecting: nvidia-curand-cu12==10.3.9.90 [installed] (installed) 2025-09-07T06:53:16.4425842Z #23 0.521 DEBUG Found installed version of nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.9.90 2025-09-07T06:53:16.4427116Z #23 0.521 DEBUG Searching for a compatible version of nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==10.3.9.90) 2025-09-07T06:53:16.4428404Z #23 0.521 DEBUG Found installed version of nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.9.90 2025-09-07T06:53:16.4429417Z #23 0.521 DEBUG Selecting: nvidia-curand-cu12==10.3.9.90 [installed] (installed) 2025-09-07T06:53:16.4430368Z #23 0.521 DEBUG Searching for a compatible version of nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=11.7.3.90, <11.7.3.90+) 2025-09-07T06:53:16.4431805Z #23 0.521 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=11.7.3.90, <11.7.3.90+ 2025-09-07T06:53:16.4432895Z #23 0.521 DEBUG Selecting: nvidia-cusolver-cu12==11.7.3.90 [installed] (installed) 2025-09-07T06:53:16.4433695Z #23 0.521 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cusolver-cu12==11.7.3.90 2025-09-07T06:53:16.4434860Z #23 0.521 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==11.7.3.90 2025-09-07T06:53:16.4435890Z #23 0.521 DEBUG Searching for a compatible version of nvidia-cusolver-cu12 (==11.7.3.90) 2025-09-07T06:53:16.4437023Z #23 0.521 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.3.90 2025-09-07T06:53:16.4438071Z #23 0.521 DEBUG Selecting: nvidia-cusolver-cu12==11.7.3.90 [installed] (installed) 2025-09-07T06:53:16.4439140Z #23 0.521 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.3.90 2025-09-07T06:53:16.4440281Z #23 0.521 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cublas-cu12* 2025-09-07T06:53:16.4441117Z #23 0.521 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-nvjitlink-cu12* 2025-09-07T06:53:16.4441978Z #23 0.521 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cusparse-cu12* 2025-09-07T06:53:16.4442967Z #23 0.521 DEBUG Searching for a compatible version of nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==11.7.3.90) 2025-09-07T06:53:16.4444345Z #23 0.521 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies * 2025-09-07T06:53:16.4445871Z #23 0.521 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.3.90 2025-09-07T06:53:16.4446947Z #23 0.521 DEBUG Selecting: nvidia-cusolver-cu12==11.7.3.90 [installed] (installed) 2025-09-07T06:53:16.4447697Z #23 0.521 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cublas-cu12* 2025-09-07T06:53:16.4448578Z #23 0.521 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-nvjitlink-cu12* 2025-09-07T06:53:16.4449420Z #23 0.521 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cusparse-cu12* 2025-09-07T06:53:16.4450482Z #23 0.521 DEBUG Searching for a compatible version of nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.5.8.93, <12.5.8.93+) 2025-09-07T06:53:16.4452041Z #23 0.521 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.5.8.93, <12.5.8.93+ 2025-09-07T06:53:16.4453464Z #23 0.521 DEBUG Selecting: nvidia-cusparse-cu12==12.5.8.93 [installed] (installed) 2025-09-07T06:53:16.4454312Z #23 0.521 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.8.93: nvidia-cusparse-cu12==12.5.8.93 2025-09-07T06:53:16.4455534Z #23 0.521 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.8.93: nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.5.8.93 2025-09-07T06:53:16.4456634Z #23 0.521 DEBUG Searching for a compatible version of nvidia-cusparse-cu12 (==12.5.8.93) 2025-09-07T06:53:16.4457932Z #23 0.521 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.8.93 2025-09-07T06:53:16.4459142Z #23 0.521 DEBUG Selecting: nvidia-cusparse-cu12==12.5.8.93 [installed] (installed) 2025-09-07T06:53:16.4460359Z #23 0.521 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.8.93 2025-09-07T06:53:16.4461673Z #23 0.521 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.8.93: nvidia-nvjitlink-cu12* 2025-09-07T06:53:16.4462752Z #23 0.521 DEBUG Searching for a compatible version of nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.5.8.93) 2025-09-07T06:53:16.4464359Z #23 0.521 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.8.93 2025-09-07T06:53:16.4465648Z #23 0.521 DEBUG Selecting: nvidia-cusparse-cu12==12.5.8.93 [installed] (installed) 2025-09-07T06:53:16.4466415Z #23 0.521 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.8.93: nvidia-nvjitlink-cu12* 2025-09-07T06:53:16.4467458Z #23 0.521 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=0.7.1, <0.7.1+) 2025-09-07T06:53:16.4468825Z #23 0.521 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies >=0.7.1, <0.7.1+ 2025-09-07T06:53:16.4469914Z #23 0.521 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T06:53:16.4470678Z #23 0.521 DEBUG Adding transitive dependency for nvidia-cusparselt-cu12==0.7.1: nvidia-cusparselt-cu12==0.7.1 2025-09-07T06:53:16.4471830Z #23 0.521 DEBUG Adding transitive dependency for nvidia-cusparselt-cu12==0.7.1: nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==0.7.1 2025-09-07T06:53:16.4472866Z #23 0.521 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12 (==0.7.1) 2025-09-07T06:53:16.4473922Z #23 0.521 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T06:53:16.4475383Z #23 0.521 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T06:53:16.4476420Z #23 0.521 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T06:53:16.4477348Z #23 0.521 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==0.7.1) 2025-09-07T06:53:16.4478647Z #23 0.521 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T06:53:16.4479678Z #23 0.521 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T06:53:16.4480579Z #23 0.521 DEBUG Searching for a compatible version of nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=2.27.5, <2.27.5+) 2025-09-07T06:53:16.4481949Z #23 0.521 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=2.27.5, <2.27.5+ 2025-09-07T06:53:16.4483024Z #23 0.521 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T06:53:16.4483727Z #23 0.521 DEBUG Adding transitive dependency for nvidia-nccl-cu12==2.27.5: nvidia-nccl-cu12==2.27.5 2025-09-07T06:53:16.4484762Z #23 0.521 DEBUG Adding transitive dependency for nvidia-nccl-cu12==2.27.5: nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==2.27.5 2025-09-07T06:53:16.4485697Z #23 0.521 DEBUG Searching for a compatible version of nvidia-nccl-cu12 (==2.27.5) 2025-09-07T06:53:16.4486805Z #23 0.521 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T06:53:16.4487843Z #23 0.521 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T06:53:16.4488893Z #23 0.521 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T06:53:16.4490193Z #23 0.521 DEBUG Searching for a compatible version of nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==2.27.5) 2025-09-07T06:53:16.4491484Z #23 0.521 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T06:53:16.4493178Z #23 0.521 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T06:53:16.4494138Z #23 0.521 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=3.3.20, <3.3.20+) 2025-09-07T06:53:16.4495643Z #23 0.521 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=3.3.20, <3.3.20+ 2025-09-07T06:53:16.4496853Z #23 0.521 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T06:53:16.4497634Z #23 0.521 DEBUG Adding transitive dependency for nvidia-nvshmem-cu12==3.3.20: nvidia-nvshmem-cu12==3.3.20 2025-09-07T06:53:16.4498806Z #23 0.521 DEBUG Adding transitive dependency for nvidia-nvshmem-cu12==3.3.20: nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==3.3.20 2025-09-07T06:53:16.4499864Z #23 0.521 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12 (==3.3.20) 2025-09-07T06:53:16.4501054Z #23 0.521 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T06:53:16.4502733Z #23 0.521 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T06:53:16.4503947Z #23 0.521 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T06:53:16.4505011Z #23 0.521 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==3.3.20) 2025-09-07T06:53:16.4506528Z #23 0.521 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T06:53:16.4507613Z #23 0.521 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T06:53:16.4508508Z #23 0.521 DEBUG Searching for a compatible version of nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.90, <12.8.90+) 2025-09-07T06:53:16.4509885Z #23 0.521 DEBUG Found installed version of nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.8.90, <12.8.90+ 2025-09-07T06:53:16.4510970Z #23 0.521 DEBUG Selecting: nvidia-nvtx-cu12==12.8.90 [installed] (installed) 2025-09-07T06:53:16.4511679Z #23 0.521 DEBUG Adding transitive dependency for nvidia-nvtx-cu12==12.8.90: nvidia-nvtx-cu12==12.8.90 2025-09-07T06:53:16.4512714Z #23 0.521 DEBUG Adding transitive dependency for nvidia-nvtx-cu12==12.8.90: nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.90 2025-09-07T06:53:16.4513669Z #23 0.521 DEBUG Searching for a compatible version of nvidia-nvtx-cu12 (==12.8.90) 2025-09-07T06:53:16.4514801Z #23 0.521 DEBUG Found installed version of nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T06:53:16.4516309Z #23 0.521 DEBUG Found installed version of nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T06:53:16.4517369Z #23 0.521 DEBUG Selecting: nvidia-nvtx-cu12==12.8.90 [installed] (installed) 2025-09-07T06:53:16.4518243Z #23 0.521 DEBUG Searching for a compatible version of nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.90) 2025-09-07T06:53:16.4519556Z #23 0.521 DEBUG Found installed version of nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T06:53:16.4520612Z #23 0.521 DEBUG Selecting: nvidia-nvtx-cu12==12.8.90 [installed] (installed) 2025-09-07T06:53:16.4521559Z #23 0.521 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.93, <12.8.93+) 2025-09-07T06:53:16.4523018Z #23 0.521 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies >=12.8.93, <12.8.93+ 2025-09-07T06:53:16.4524199Z #23 0.521 DEBUG Selecting: nvidia-nvjitlink-cu12==12.8.93 [installed] (installed) 2025-09-07T06:53:16.4524977Z #23 0.521 DEBUG Adding transitive dependency for nvidia-nvjitlink-cu12==12.8.93: nvidia-nvjitlink-cu12==12.8.93 2025-09-07T06:53:16.4526135Z #23 0.521 DEBUG Adding transitive dependency for nvidia-nvjitlink-cu12==12.8.93: nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.93 2025-09-07T06:53:16.4527169Z #23 0.521 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12 (==12.8.93) 2025-09-07T06:53:16.4528332Z #23 0.521 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T06:53:16.4529474Z #23 0.521 DEBUG Selecting: nvidia-nvjitlink-cu12==12.8.93 [installed] (installed) 2025-09-07T06:53:16.4530651Z #23 0.521 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T06:53:16.4532145Z #23 0.521 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.93) 2025-09-07T06:53:16.4533835Z #23 0.521 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T06:53:16.4535041Z #23 0.521 DEBUG Selecting: nvidia-nvjitlink-cu12==12.8.93 [installed] (installed) 2025-09-07T06:53:16.4536038Z #23 0.521 DEBUG Searching for a compatible version of nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=1.13.1.3, <1.13.1.3+) 2025-09-07T06:53:16.4537564Z #23 0.521 DEBUG Found installed version of nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=1.13.1.3, <1.13.1.3+ 2025-09-07T06:53:16.4538762Z #23 0.521 DEBUG Selecting: nvidia-cufile-cu12==1.13.1.3 [installed] (installed) 2025-09-07T06:53:16.4539572Z #23 0.522 DEBUG Adding transitive dependency for nvidia-cufile-cu12==1.13.1.3: nvidia-cufile-cu12==1.13.1.3 2025-09-07T06:53:16.4540743Z #23 0.522 DEBUG Adding transitive dependency for nvidia-cufile-cu12==1.13.1.3: nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==1.13.1.3 2025-09-07T06:53:16.4541789Z #23 0.522 DEBUG Searching for a compatible version of nvidia-cufile-cu12 (==1.13.1.3) 2025-09-07T06:53:16.4543054Z #23 0.522 DEBUG Found installed version of nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.13.1.3 2025-09-07T06:53:16.4544924Z #23 0.522 DEBUG Found installed version of nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.13.1.3 2025-09-07T06:53:16.4546025Z #23 0.522 DEBUG Selecting: nvidia-cufile-cu12==1.13.1.3 [installed] (installed) 2025-09-07T06:53:16.4546913Z #23 0.522 DEBUG Searching for a compatible version of nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==1.13.1.3) 2025-09-07T06:53:16.4548257Z #23 0.522 DEBUG Found installed version of nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.13.1.3 2025-09-07T06:53:16.4549385Z #23 0.522 DEBUG Selecting: nvidia-cufile-cu12==1.13.1.3 [installed] (installed) 2025-09-07T06:53:16.4550284Z #23 0.522 DEBUG Searching for a compatible version of pytorch-triton{platform_machine == 'x86_64' and sys_platform == 'linux'} (==3.4.0+gitf7888497) 2025-09-07T06:53:16.4551984Z #23 0.522 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T06:53:16.4553272Z #23 0.522 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T06:53:16.4554105Z #23 0.522 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: pytorch-triton==3.4.0+gitf7888497 2025-09-07T06:53:16.4555326Z #23 0.522 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: pytorch-triton{platform_machine == 'x86_64' and sys_platform == 'linux'}==3.4.0+gitf7888497 2025-09-07T06:53:16.4556423Z #23 0.522 DEBUG Searching for a compatible version of pytorch-triton (==3.4.0+gitf7888497) 2025-09-07T06:53:16.4557730Z #23 0.522 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T06:53:16.4559047Z #23 0.522 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T06:53:16.4560340Z #23 0.522 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T06:53:16.4561725Z #23 0.522 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: setuptools>=40.8.0 2025-09-07T06:53:16.4562845Z #23 0.522 DEBUG Searching for a compatible version of pytorch-triton{platform_machine == 'x86_64' and sys_platform == 'linux'} (==3.4.0+gitf7888497) 2025-09-07T06:53:16.4564408Z #23 0.522 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T06:53:16.4565788Z #23 0.522 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies >=40.8.0 2025-09-07T06:53:16.4566632Z #23 0.522 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T06:53:16.4567330Z #23 0.522 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: setuptools>=40.8.0 2025-09-07T06:53:16.4567970Z #23 0.522 DEBUG Searching for a compatible version of numpy (*) 2025-09-07T06:53:16.4568480Z #23 0.522 DEBUG Found installed version of numpy==2.2.6 that satisfies * 2025-09-07T06:53:16.4568969Z #23 0.522 DEBUG Selecting: numpy==2.2.6 [installed] (installed) 2025-09-07T06:53:16.4569451Z #23 0.522 DEBUG Searching for a compatible version of filelock (*) 2025-09-07T06:53:16.4570194Z #23 0.522 DEBUG Found installed version of filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) that satisfies * 2025-09-07T06:53:16.4570915Z #23 0.522 DEBUG Selecting: filelock==3.19.1 [installed] (installed) 2025-09-07T06:53:16.4571459Z #23 0.522 DEBUG Searching for a compatible version of typing-extensions (>=4.10.0) 2025-09-07T06:53:16.4572614Z #23 0.522 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.10.0 2025-09-07T06:53:16.4573606Z #23 0.522 DEBUG Selecting: typing-extensions==4.14.1 [installed] (installed) 2025-09-07T06:53:16.4574343Z #23 0.522 DEBUG Searching for a compatible version of setuptools{python_full_version >= '3.12'} (*) 2025-09-07T06:53:16.4575335Z #23 0.522 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies * 2025-09-07T06:53:16.4576260Z #23 0.522 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T06:53:16.4576928Z #23 0.522 DEBUG Adding transitive dependency for setuptools==78.1.0: setuptools==78.1.0 2025-09-07T06:53:16.4577807Z #23 0.522 DEBUG Adding transitive dependency for setuptools==78.1.0: setuptools{python_full_version >= '3.12'}==78.1.0 2025-09-07T06:53:16.4578612Z #23 0.522 DEBUG Searching for a compatible version of setuptools (==78.1.0) 2025-09-07T06:53:16.4579527Z #23 0.522 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T06:53:16.4580402Z #23 0.522 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T06:53:16.4581284Z #23 0.522 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T06:53:16.4582320Z #23 0.522 DEBUG Searching for a compatible version of setuptools{python_full_version >= '3.12'} (==78.1.0) 2025-09-07T06:53:16.4583350Z #23 0.522 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T06:53:16.4584225Z #23 0.522 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T06:53:16.4584995Z #23 0.522 DEBUG Searching for a compatible version of sympy (>=1.13.3) 2025-09-07T06:53:16.4585758Z #23 0.522 DEBUG Found installed version of sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) that satisfies >=1.13.3 2025-09-07T06:53:16.4586465Z #23 0.522 DEBUG Selecting: sympy==1.14.0 [installed] (installed) 2025-09-07T06:53:16.4587035Z #23 0.522 DEBUG Adding transitive dependency for sympy==1.14.0: mpmath>=1.1.0, <1.4 2025-09-07T06:53:16.4587619Z #23 0.522 DEBUG Searching for a compatible version of networkx (>=2.5.1) 2025-09-07T06:53:16.4588357Z #23 0.522 DEBUG Found installed version of networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) that satisfies >=2.5.1 2025-09-07T06:53:16.4589078Z #23 0.522 DEBUG Selecting: networkx==3.5 [installed] (installed) 2025-09-07T06:53:16.4589561Z #23 0.522 DEBUG Searching for a compatible version of jinja2 (*) 2025-09-07T06:53:16.4590237Z #23 0.522 DEBUG Found installed version of jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) that satisfies * 2025-09-07T06:53:16.4590925Z #23 0.522 DEBUG Selecting: jinja2==3.1.6 [installed] (installed) 2025-09-07T06:53:16.4591446Z #23 0.522 DEBUG Adding transitive dependency for jinja2==3.1.6: markupsafe>=2.0 2025-09-07T06:53:16.4592157Z #23 0.522 DEBUG Searching for a compatible version of fsspec (>=0.8.5) 2025-09-07T06:53:16.4593331Z #23 0.522 DEBUG Found installed version of fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) that satisfies >=0.8.5 2025-09-07T06:53:16.4594172Z #23 0.522 DEBUG Selecting: fsspec==2025.7.0 [installed] (installed) 2025-09-07T06:53:16.4594774Z #23 0.522 DEBUG Found stale response for: https://pypi.org/simple/mpmath/ 2025-09-07T06:53:16.4595434Z #23 0.522 DEBUG Sending revalidation request for: https://pypi.org/simple/mpmath/ 2025-09-07T06:53:16.4596197Z #23 0.523 DEBUG Found stale response for: https://pypi.org/simple/markupsafe/ 2025-09-07T06:53:16.4596899Z #23 0.523 DEBUG Sending revalidation request for: https://pypi.org/simple/markupsafe/ 2025-09-07T06:53:16.4597630Z #23 0.523 DEBUG Found not-modified response for: https://pypi.org/simple/mpmath/ 2025-09-07T06:53:16.4598298Z #23 0.524 DEBUG Searching for a compatible version of mpmath (>=1.1.0, <1.4) 2025-09-07T06:53:16.4599160Z #23 0.524 DEBUG Found installed version of mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) that satisfies >=1.1.0, <1.4 2025-09-07T06:53:16.4599984Z #23 0.524 DEBUG Selecting: mpmath==1.3.0 [installed] (installed) 2025-09-07T06:53:16.4600613Z #23 0.524 DEBUG Found not-modified response for: https://pypi.org/simple/markupsafe/ 2025-09-07T06:53:16.4601553Z #23 0.524 DEBUG Found installed version of mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) that satisfies >=1.1.0, <1.4 2025-09-07T06:53:16.4602464Z #23 0.524 DEBUG Searching for a compatible version of markupsafe (>=2.0) 2025-09-07T06:53:16.4603522Z #23 0.524 DEBUG Found installed version of markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) that satisfies >=2.0 2025-09-07T06:53:16.4605233Z #23 0.524 DEBUG Found installed version of markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) that satisfies >=2.0 2025-09-07T06:53:16.4606145Z #23 0.524 DEBUG Selecting: markupsafe==3.0.2 [installed] (installed) 2025-09-07T06:53:16.4608497Z #23 0.526 DEBUG Tried 28 versions: filelock 1, fsspec 1, jinja2 1, markupsafe 1, mpmath 1, networkx 1, numpy 1, nvidia-cublas-cu12 1, nvidia-cuda-cupti-cu12 1, nvidia-cuda-nvrtc-cu12 1, nvidia-cuda-runtime-cu12 1, nvidia-cudnn-cu12 1, nvidia-cufft-cu12 1, nvidia-cufile-cu12 1, nvidia-curand-cu12 1, nvidia-cusolver-cu12 1, nvidia-cusparse-cu12 1, nvidia-cusparselt-cu12 1, nvidia-nccl-cu12 1, nvidia-nvjitlink-cu12 1, nvidia-nvshmem-cu12 1, nvidia-nvtx-cu12 1, pytorch-triton 1, setuptools 1, sympy 1, torch 1, typing-extensions 1, xformers 1 2025-09-07T06:53:16.4610811Z #23 0.526 DEBUG marker environment resolution took 0.029s 2025-09-07T06:53:16.4611187Z #23 0.526 Resolved 28 packages in 33ms 2025-09-07T06:53:16.4612179Z #23 0.526 DEBUG Requirement already installed: nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:53:16.4613823Z #23 0.526 DEBUG Requirement already installed: nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) 2025-09-07T06:53:16.4615237Z #23 0.526 DEBUG Requirement already installed: nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T06:53:16.4616377Z #23 0.526 DEBUG Requirement already installed: sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) 2025-09-07T06:53:16.4627118Z #23 0.526 DEBUG Requirement already installed: nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:53:16.4628587Z #23 0.526 DEBUG Requirement already installed: nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:53:16.4629964Z #23 0.526 DEBUG Requirement already installed: nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T06:53:16.4631447Z #23 0.526 DEBUG Requirement already installed: pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T06:53:16.4633004Z #23 0.526 DEBUG Requirement already installed: nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:53:16.4634473Z #23 0.526 DEBUG Requirement already installed: nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T06:53:16.4635856Z #23 0.526 DEBUG Requirement already installed: nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T06:53:16.4637207Z #23 0.526 DEBUG Requirement already installed: nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T06:53:16.4638277Z #23 0.526 DEBUG Requirement already installed: jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) 2025-09-07T06:53:16.4639466Z #23 0.526 DEBUG Requirement already installed: nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:53:16.4640592Z #23 0.526 DEBUG Requirement already installed: setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) 2025-09-07T06:53:16.4641668Z #23 0.526 DEBUG Requirement already installed: markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T06:53:16.4642901Z #23 0.526 DEBUG Identified uncached distribution: xformers @ file:///workspace/xformers-dist/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl 2025-09-07T06:53:16.4643965Z #23 0.526 DEBUG Requirement already installed: filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) 2025-09-07T06:53:16.4644638Z #23 0.526 DEBUG Requirement already installed: numpy==2.2.6 2025-09-07T06:53:16.4645582Z #23 0.526 DEBUG Requirement already installed: nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:53:16.4646947Z #23 0.526 DEBUG Requirement already installed: torch==2.9.0.dev20250906+cu128 (from file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T06:53:16.4648251Z #23 0.526 DEBUG Requirement already installed: nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:53:16.4649371Z #23 0.526 DEBUG Requirement already installed: networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) 2025-09-07T06:53:16.4650315Z #23 0.526 DEBUG Requirement already installed: typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) 2025-09-07T06:53:16.4651288Z #23 0.526 DEBUG Requirement already installed: fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) 2025-09-07T06:53:16.4652654Z #23 0.526 DEBUG Requirement already installed: nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T06:53:16.4653858Z #23 0.526 DEBUG Requirement already installed: mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) 2025-09-07T06:53:16.4655107Z #23 0.526 DEBUG Requirement already installed: nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T06:53:16.4656128Z #23 0.526 DEBUG Unnecessary package: pyyaml==6.0.2 2025-09-07T06:53:16.4656618Z #23 0.526 DEBUG Unnecessary package: aiohappyeyeballs==2.6.1 2025-09-07T06:53:16.4657098Z #23 0.526 DEBUG Unnecessary package: aiohttp==3.12.15 2025-09-07T06:53:16.4657557Z #23 0.526 DEBUG Unnecessary package: aiosignal==1.4.0 2025-09-07T06:53:16.4658028Z #23 0.526 DEBUG Unnecessary package: annotated-types==0.7.0 2025-09-07T06:53:16.4658500Z #23 0.526 DEBUG Unnecessary package: anyio==4.10.0 2025-09-07T06:53:16.4658919Z #23 0.526 DEBUG Unnecessary package: astor==0.8.1 2025-09-07T06:53:16.4659346Z #23 0.526 DEBUG Unnecessary package: attrs==25.3.0 2025-09-07T06:53:16.4659763Z #23 0.526 DEBUG Unnecessary package: blake3==1.0.5 2025-09-07T06:53:16.4660250Z #23 0.526 DEBUG Unnecessary package: build==1.3.0 2025-09-07T06:53:16.4660698Z #23 0.526 DEBUG Unnecessary package: cachetools==6.2.0 2025-09-07T06:53:16.4661129Z #23 0.526 DEBUG Unnecessary package: cbor2==5.7.0 2025-09-07T06:53:16.4661576Z #23 0.526 DEBUG Unnecessary package: certifi==2025.8.3 2025-09-07T06:53:16.4662003Z #23 0.526 DEBUG Unnecessary package: cffi==1.17.1 2025-09-07T06:53:16.4662482Z #23 0.526 DEBUG Unnecessary package: charset-normalizer==3.4.3 2025-09-07T06:53:16.4662957Z #23 0.526 DEBUG Unnecessary package: click==8.2.1 2025-09-07T06:53:16.4663411Z #23 0.526 DEBUG Unnecessary package: cloudpickle==3.1.1 2025-09-07T06:53:16.4663904Z #23 0.526 DEBUG Unnecessary package: compressed-tensors==0.11.0 2025-09-07T06:53:16.4664397Z #23 0.526 DEBUG Unnecessary package: depyf==0.19.0 2025-09-07T06:53:16.4664932Z #23 0.526 DEBUG Unnecessary package: dill==0.4.0 2025-09-07T06:53:16.4665311Z #23 0.526 DEBUG Unnecessary package: diskcache==5.6.3 2025-09-07T06:53:16.4665738Z #23 0.526 DEBUG Unnecessary package: distro==1.9.0 2025-09-07T06:53:16.4666124Z #23 0.526 DEBUG Unnecessary package: dnspython==2.7.0 2025-09-07T06:53:16.4666516Z #23 0.526 DEBUG Unnecessary package: einops==0.8.1 2025-09-07T06:53:16.4666922Z #23 0.526 DEBUG Unnecessary package: email-validator==2.3.0 2025-09-07T06:53:16.4667352Z #23 0.526 DEBUG Unnecessary package: fastapi==0.116.1 2025-09-07T06:53:16.4667765Z #23 0.526 DEBUG Unnecessary package: fastapi-cli==0.0.10 2025-09-07T06:53:16.4668196Z #23 0.526 DEBUG Unnecessary package: fastapi-cloud-cli==0.1.5 2025-09-07T06:53:16.4668637Z #23 0.526 DEBUG Unnecessary package: frozendict==2.4.6 2025-09-07T06:53:16.4669042Z #23 0.526 DEBUG Unnecessary package: frozenlist==1.7.0 2025-09-07T06:53:16.4669438Z #23 0.526 DEBUG Unnecessary package: gguf==0.17.1 2025-09-07T06:53:16.4669805Z #23 0.526 DEBUG Unnecessary package: h11==0.16.0 2025-09-07T06:53:16.4670186Z #23 0.526 DEBUG Unnecessary package: hf-xet==1.1.9 2025-09-07T06:53:16.4670573Z #23 0.526 DEBUG Unnecessary package: httpcore==1.0.9 2025-09-07T06:53:16.4670987Z #23 0.526 DEBUG Unnecessary package: httptools==0.6.4 2025-09-07T06:53:16.4671383Z #23 0.526 DEBUG Unnecessary package: httpx==0.28.1 2025-09-07T06:53:16.4671794Z #23 0.526 DEBUG Unnecessary package: huggingface-hub==0.34.4 2025-09-07T06:53:16.4672251Z #23 0.526 DEBUG Unnecessary package: idna==3.10 2025-09-07T06:53:16.4672634Z #23 0.526 DEBUG Unnecessary package: interegular==0.3.3 2025-09-07T06:53:16.4673039Z #23 0.526 DEBUG Unnecessary package: jiter==0.10.0 2025-09-07T06:53:16.4673431Z #23 0.526 DEBUG Unnecessary package: jsonschema==4.25.1 2025-09-07T06:53:16.4673957Z #23 0.526 DEBUG Unnecessary package: jsonschema-specifications==2025.4.1 2025-09-07T06:53:16.4674437Z #23 0.526 DEBUG Unnecessary package: lark==1.2.2 2025-09-07T06:53:16.4674822Z #23 0.526 DEBUG Unnecessary package: llguidance==0.7.30 2025-09-07T06:53:16.4675236Z #23 0.526 DEBUG Unnecessary package: llvmlite==0.44.0 2025-09-07T06:53:16.4675668Z #23 0.526 DEBUG Unnecessary package: lm-format-enforcer==0.11.3 2025-09-07T06:53:16.4676139Z #23 0.526 DEBUG Unnecessary package: markdown-it-py==4.0.0 2025-09-07T06:53:16.4676542Z #23 0.526 DEBUG Unnecessary package: mdurl==0.1.2 2025-09-07T06:53:16.4676951Z #23 0.526 DEBUG Unnecessary package: mistral-common==1.8.4 2025-09-07T06:53:16.4677366Z #23 0.526 DEBUG Unnecessary package: msgspec==0.19.0 2025-09-07T06:53:16.4677775Z #23 0.526 DEBUG Unnecessary package: multidict==6.6.4 2025-09-07T06:53:16.4678168Z #23 0.526 DEBUG Unnecessary package: ninja==1.13.0 2025-09-07T06:53:16.4678543Z #23 0.526 DEBUG Unnecessary package: numba==0.61.2 2025-09-07T06:53:16.4678941Z #23 0.526 DEBUG Unnecessary package: openai==1.106.1 2025-09-07T06:53:16.4679349Z #23 0.526 DEBUG Unnecessary package: openai-harmony==0.0.4 2025-09-07T06:53:16.4679844Z #23 0.526 DEBUG Unnecessary package: opencv-python-headless==4.12.0.88 2025-09-07T06:53:16.4680310Z #23 0.526 DEBUG Unnecessary package: opt-einsum==3.4.0 2025-09-07T06:53:16.4680740Z #23 0.526 DEBUG Unnecessary package: outlines-core==0.2.10 2025-09-07T06:53:16.4681201Z #23 0.526 DEBUG Unnecessary package: packaging==25.0 2025-09-07T06:53:16.4681665Z #23 0.526 DEBUG Unnecessary package: partial-json-parser==0.2.1.1.post6 2025-09-07T06:53:16.4682481Z #23 0.526 DEBUG Unnecessary package: pillow==11.3.0 (from file:///dist/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T06:53:16.4683211Z #23 0.526 DEBUG Preserving seed package: pip==25.2 2025-09-07T06:53:16.4683652Z #23 0.526 DEBUG Unnecessary package: prometheus-client==0.22.1 2025-09-07T06:53:16.4684190Z #23 0.526 DEBUG Unnecessary package: prometheus-fastapi-instrumentator==7.1.0 2025-09-07T06:53:16.4684719Z #23 0.526 DEBUG Unnecessary package: propcache==0.3.2 2025-09-07T06:53:16.4685131Z #23 0.526 DEBUG Unnecessary package: protobuf==6.32.0 2025-09-07T06:53:16.4685515Z #23 0.526 DEBUG Unnecessary package: psutil==7.0.0 2025-09-07T06:53:16.4685918Z #23 0.526 DEBUG Unnecessary package: py-cpuinfo==9.0.0 2025-09-07T06:53:16.4686346Z #23 0.526 DEBUG Unnecessary package: pybase64==1.4.2 2025-09-07T06:53:16.4686760Z #23 0.526 DEBUG Unnecessary package: pycountry==24.6.1 2025-09-07T06:53:16.4687152Z #23 0.526 DEBUG Unnecessary package: pycparser==2.22 2025-09-07T06:53:16.4687557Z #23 0.526 DEBUG Unnecessary package: pydantic==2.11.7 2025-09-07T06:53:16.4687979Z #23 0.526 DEBUG Unnecessary package: pydantic-core==2.33.2 2025-09-07T06:53:16.4688438Z #23 0.526 DEBUG Unnecessary package: pydantic-extra-types==2.10.5 2025-09-07T06:53:16.4688893Z #23 0.526 DEBUG Unnecessary package: pygments==2.19.2 2025-09-07T06:53:16.4689307Z #23 0.526 DEBUG Unnecessary package: pyproject-hooks==1.2.0 2025-09-07T06:53:16.4689752Z #23 0.526 DEBUG Unnecessary package: python-dotenv==1.1.1 2025-09-07T06:53:16.4690195Z #23 0.526 DEBUG Unnecessary package: python-json-logger==3.3.0 2025-09-07T06:53:16.4690665Z #23 0.526 DEBUG Unnecessary package: python-multipart==0.0.20 2025-09-07T06:53:16.4691078Z #23 0.526 DEBUG Unnecessary package: pyzmq==27.0.2 2025-09-07T06:53:16.4691491Z #23 0.526 DEBUG Unnecessary package: referencing==0.36.2 2025-09-07T06:53:16.4692127Z #23 0.526 DEBUG Unnecessary package: regex==2025.9.1 2025-09-07T06:53:16.4692720Z #23 0.526 DEBUG Unnecessary package: requests==2.32.5 2025-09-07T06:53:16.4693173Z #23 0.526 DEBUG Unnecessary package: rich==14.1.0 2025-09-07T06:53:16.4693698Z #23 0.526 DEBUG Unnecessary package: rich-toolkit==0.15.1 2025-09-07T06:53:16.4694160Z #23 0.526 DEBUG Unnecessary package: rignore==0.6.4 2025-09-07T06:53:16.4694596Z #23 0.526 DEBUG Unnecessary package: rpds-py==0.27.1 2025-09-07T06:53:16.4695056Z #23 0.526 DEBUG Unnecessary package: safetensors==0.6.2 2025-09-07T06:53:16.4695555Z #23 0.526 DEBUG Unnecessary package: scipy==1.16.1 2025-09-07T06:53:16.4696003Z #23 0.526 DEBUG Unnecessary package: sentencepiece==0.2.1 2025-09-07T06:53:16.4696485Z #23 0.526 DEBUG Unnecessary package: sentry-sdk==2.37.0 2025-09-07T06:53:16.4696944Z #23 0.526 DEBUG Unnecessary package: setproctitle==1.3.7 2025-09-07T06:53:16.4697423Z #23 0.526 DEBUG Unnecessary package: shellingham==1.5.4 2025-09-07T06:53:16.4697862Z #23 0.526 DEBUG Unnecessary package: six==1.17.0 2025-09-07T06:53:16.4698288Z #23 0.526 DEBUG Unnecessary package: sniffio==1.3.1 2025-09-07T06:53:16.4698727Z #23 0.526 DEBUG Unnecessary package: soundfile==0.13.1 2025-09-07T06:53:16.4699187Z #23 0.526 DEBUG Unnecessary package: soxr==0.5.0.post1 2025-09-07T06:53:16.4699640Z #23 0.526 DEBUG Unnecessary package: starlette==0.47.3 2025-09-07T06:53:16.4700081Z #23 0.526 DEBUG Unnecessary package: tiktoken==0.11.0 2025-09-07T06:53:16.4700540Z #23 0.526 DEBUG Unnecessary package: tokenizers==0.22.0 2025-09-07T06:53:16.4701493Z #23 0.526 DEBUG Unnecessary package: torchaudio==2.8.0.dev20250906+cu128 (from file:///dist/torchaudio-2.8.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T06:53:16.4702963Z #23 0.526 DEBUG Unnecessary package: torchvision==0.24.0.dev20250906+cu128 (from file:///dist/torchvision-0.24.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T06:53:16.4703973Z #23 0.526 DEBUG Unnecessary package: tqdm==4.67.1 2025-09-07T06:53:16.4704417Z #23 0.526 DEBUG Unnecessary package: transformers==4.56.1 2025-09-07T06:53:16.4704981Z #23 0.526 DEBUG Unnecessary package: triton==3.4.0 2025-09-07T06:53:16.4705359Z #23 0.526 DEBUG Unnecessary package: typer==0.17.4 2025-09-07T06:53:16.4705791Z #23 0.526 DEBUG Unnecessary package: typing-inspection==0.4.1 2025-09-07T06:53:16.4706209Z #23 0.526 DEBUG Unnecessary package: urllib3==2.5.0 2025-09-07T06:53:16.4706605Z #23 0.526 DEBUG Preserving seed package: uv==0.8.4 2025-09-07T06:53:16.4707005Z #23 0.526 DEBUG Unnecessary package: uvicorn==0.35.0 2025-09-07T06:53:16.4707390Z #23 0.526 DEBUG Unnecessary package: uvloop==0.21.0 2025-09-07T06:53:16.4707799Z #23 0.526 DEBUG Unnecessary package: watchfiles==1.1.0 2025-09-07T06:53:16.4708203Z #23 0.526 DEBUG Unnecessary package: websockets==15.0.1 2025-09-07T06:53:16.4708781Z #23 0.526 DEBUG Unnecessary package: wheel==0.45.1 2025-09-07T06:53:16.4709266Z #23 0.526 DEBUG Unnecessary package: xgrammar==0.1.23 2025-09-07T06:53:16.4709685Z #23 0.526 DEBUG Unnecessary package: yarl==1.20.1 2025-09-07T06:53:18.3175336Z #23 2.573 Prepared 1 package in 2.04s 2025-09-07T06:53:18.7657109Z #23 3.021 Installed 1 package in 447ms 2025-09-07T06:53:18.7658067Z #23 3.021 + xformers==0.0.33+5d4b92a5.d20250907 (from file:///workspace/xformers-dist/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl) 2025-09-07T06:53:18.9163048Z #23 3.021 DEBUG Released lock at `/tmp/uv-281d6a3886c08524.lock` 2025-09-07T06:53:34.1174920Z #23 DONE 18.4s 2025-09-07T06:53:34.2710925Z 2025-09-07T06:53:34.2711778Z #24 [base 18/20] RUN uv pip freeze | grep -i '^torch\|^torchvision\|^torchaudio' > torch_build_versions.txt 2025-09-07T06:53:34.7377888Z #24 0.618 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T06:53:34.9153604Z #24 DONE 0.6s 2025-09-07T06:53:34.9153819Z 2025-09-07T06:53:34.9153990Z #25 [base 19/20] RUN cat torch_build_versions.txt 2025-09-07T06:53:35.5444603Z #25 0.780 torch @ file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T06:53:35.5445530Z #25 0.780 torchaudio @ file:///dist/torchaudio-2.8.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T06:53:35.5446510Z #25 0.780 torchvision @ file:///dist/torchvision-0.24.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T06:53:35.7036491Z #25 DONE 0.8s 2025-09-07T06:53:35.7036956Z 2025-09-07T06:53:35.7037687Z #26 [base 20/20] RUN pip freeze | grep -E 'torch|xformers|torchvision|torchaudio' 2025-09-07T06:53:37.0994737Z #26 1.547 pytorch-triton @ file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T06:53:37.0996201Z #26 1.547 torch @ file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T06:53:37.0997093Z #26 1.547 torchaudio @ file:///dist/torchaudio-2.8.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T06:53:37.0998080Z #26 1.547 torchvision @ file:///dist/torchvision-0.24.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T06:53:37.2639407Z #26 1.547 xformers @ file:///workspace/xformers-dist/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl 2025-09-07T06:53:37.2640144Z #26 DONE 1.6s 2025-09-07T06:53:37.2640302Z 2025-09-07T06:53:37.2640425Z #27 [build 1/7] COPY . . 2025-09-07T06:53:49.0048917Z #27 ... 2025-09-07T06:53:49.0049162Z 2025-09-07T06:53:49.0049522Z #28 [export-wheels 1/4] COPY --from=base /workspace/xformers-dist /wheels/xformers 2025-09-07T06:53:50.3927697Z #28 DONE 1.4s 2025-09-07T06:53:50.3927953Z 2025-09-07T06:53:50.3928486Z #29 [vllm-base 4/18] COPY --from=base /workspace/torch_build_versions.txt ./torch_build_versions.txt 2025-09-07T06:54:11.5845677Z #29 ... 2025-09-07T06:54:11.5845911Z 2025-09-07T06:54:11.5846028Z #27 [build 1/7] COPY . . 2025-09-07T06:54:11.5846371Z #27 DONE 34.5s 2025-09-07T06:54:11.7914230Z 2025-09-07T06:54:11.7915580Z #29 [vllm-base 4/18] COPY --from=base /workspace/torch_build_versions.txt ./torch_build_versions.txt 2025-09-07T06:54:11.7916462Z #29 DONE 22.6s 2025-09-07T06:54:11.7916614Z 2025-09-07T06:54:11.7916874Z #30 [vllm-base 5/18] COPY --from=base /workspace/xformers-dist /wheels/xformers 2025-09-07T06:54:11.7917385Z #30 DONE 0.0s 2025-09-07T06:54:11.7917528Z 2025-09-07T06:54:11.7917682Z #31 [build 2/7] RUN python3 use_existing_torch.py 2025-09-07T06:54:12.1453496Z #31 0.554 >>> cleaning requirements/common.txt 2025-09-07T06:54:12.1453998Z #31 0.554 <<< done cleaning requirements/common.txt 2025-09-07T06:54:12.1454374Z #31 0.554 2025-09-07T06:54:12.1454682Z #31 0.554 >>> cleaning requirements/build.txt 2025-09-07T06:54:12.1455030Z #31 0.554 removed: 2025-09-07T06:54:12.1455292Z #31 0.554 torch==2.8.0 2025-09-07T06:54:12.1455617Z #31 0.554 <<< done cleaning requirements/build.txt 2025-09-07T06:54:12.1455987Z #31 0.554 2025-09-07T06:54:12.1456276Z #31 0.554 >>> cleaning requirements/cpu-build.txt 2025-09-07T06:54:12.1456638Z #31 0.554 removed: 2025-09-07T06:54:12.1457383Z #31 0.554 # Temporarily used for x86 CPU backend to avoid performance regression of torch>2.6.0+cpu, 2025-09-07T06:54:12.1458084Z #31 0.554 # see https://github.com/pytorch/pytorch/pull/151218 2025-09-07T06:54:12.1458645Z #31 0.554 --extra-index-url https://download.pytorch.org/whl/cpu 2025-09-07T06:54:12.1459100Z #31 0.554 torch==2.6.0+cpu 2025-09-07T06:54:12.1459455Z #31 0.554 <<< done cleaning requirements/cpu-build.txt 2025-09-07T06:54:12.1459843Z #31 0.554 2025-09-07T06:54:12.1460101Z #31 0.554 >>> cleaning requirements/cpu.txt 2025-09-07T06:54:12.1460454Z #31 0.554 removed: 2025-09-07T06:54:12.1460824Z #31 0.554 --extra-index-url https://download.pytorch.org/whl/cpu 2025-09-07T06:54:12.1461818Z #31 0.554 torch==2.6.0+cpu; platform_machine == "x86_64" # torch>2.6.0+cpu has performance regression on x86 platform, see https://github.com/pytorch/pytorch/pull/151218 2025-09-07T06:54:12.1462703Z #31 0.554 torch==2.8.0; platform_system == "Darwin" 2025-09-07T06:54:12.1463267Z #31 0.554 torch==2.8.0; platform_machine == "ppc64le" or platform_machine == "aarch64" 2025-09-07T06:54:12.1464149Z #31 0.554 # required for the image processor of minicpm-o-2_6, this must be updated alongside torch 2025-09-07T06:54:12.1464886Z #31 0.554 torchaudio; platform_machine != "ppc64le" and platform_machine != "s390x" 2025-09-07T06:54:12.1465614Z #31 0.554 torchaudio==2.8.0; platform_machine == "ppc64le" 2025-09-07T06:54:12.1466224Z #31 0.554 # required for the image processor of phi3v, this must be updated alongside torch 2025-09-07T06:54:12.1466928Z #31 0.554 torchvision; platform_machine != "ppc64le" and platform_machine != "s390x" 2025-09-07T06:54:12.1467599Z #31 0.554 torchvision==0.23.0; platform_machine == "ppc64le" 2025-09-07T06:54:12.1468087Z #31 0.554 # Intel Extension for PyTorch, only for x86_64 CPUs 2025-09-07T06:54:12.1469061Z #31 0.554 intel_extension_for_pytorch==2.6.0; platform_machine == "x86_64" # torch>2.6.0+cpu has performance regression on x86 platform, see https://github.com/pytorch/pytorch/pull/151218 2025-09-07T06:54:12.1470392Z #31 0.554 triton==3.2.0; platform_machine == "x86_64" # Triton is required for torch 2.6+cpu, as it is imported in torch.compile. 2025-09-07T06:54:12.1471056Z #31 0.554 <<< done cleaning requirements/cpu.txt 2025-09-07T06:54:12.1471411Z #31 0.554 2025-09-07T06:54:12.1471674Z #31 0.554 >>> cleaning requirements/cuda.txt 2025-09-07T06:54:12.1471993Z #31 0.554 removed: 2025-09-07T06:54:12.1472238Z #31 0.554 torch==2.8.0 2025-09-07T06:54:12.1472493Z #31 0.554 torchaudio==2.8.0 2025-09-07T06:54:12.1472821Z #31 0.554 # These must be updated alongside torch 2025-09-07T06:54:12.1473667Z #31 0.554 torchvision==0.23.0 # Required for phi3v processor. See https://github.com/pytorch/vision?tab=readme-ov-file#installation for corresponding version 2025-09-07T06:54:12.1474771Z #31 0.554 xformers==0.0.32.post1; platform_system == 'Linux' and platform_machine == 'x86_64' # Requires PyTorch >= 2.8 2025-09-07T06:54:12.1475410Z #31 0.554 <<< done cleaning requirements/cuda.txt 2025-09-07T06:54:12.1475761Z #31 0.554 2025-09-07T06:54:12.1476081Z #31 0.554 >>> cleaning requirements/dev.txt 2025-09-07T06:54:12.1476446Z #31 0.554 <<< done cleaning requirements/dev.txt 2025-09-07T06:54:12.1476792Z #31 0.554 2025-09-07T06:54:12.1477034Z #31 0.554 >>> cleaning requirements/docs.txt 2025-09-07T06:54:12.1477372Z #31 0.554 removed: 2025-09-07T06:54:12.1477663Z #31 0.554 -f https://download.pytorch.org/whl/cpu 2025-09-07T06:54:12.1478026Z #31 0.554 torch 2025-09-07T06:54:12.1478292Z #31 0.554 <<< done cleaning requirements/docs.txt 2025-09-07T06:54:12.1478635Z #31 0.554 2025-09-07T06:54:12.1478920Z #31 0.554 >>> cleaning requirements/kv_connectors.txt 2025-09-07T06:54:12.1479345Z #31 0.554 <<< done cleaning requirements/kv_connectors.txt 2025-09-07T06:54:12.1479720Z #31 0.554 2025-09-07T06:54:12.1479967Z #31 0.554 >>> cleaning requirements/lint.txt 2025-09-07T06:54:12.1480353Z #31 0.554 <<< done cleaning requirements/lint.txt 2025-09-07T06:54:12.1480688Z #31 0.554 2025-09-07T06:54:12.1481043Z #31 0.554 >>> cleaning requirements/nightly_torch_test.txt 2025-09-07T06:54:12.1481501Z #31 0.554 <<< done cleaning requirements/nightly_torch_test.txt 2025-09-07T06:54:12.1482063Z #31 0.554 2025-09-07T06:54:12.1482329Z #31 0.554 >>> cleaning requirements/rocm-build.txt 2025-09-07T06:54:12.1482701Z #31 0.554 removed: 2025-09-07T06:54:12.1483086Z #31 0.554 --extra-index-url https://download.pytorch.org/whl/rocm6.3 2025-09-07T06:54:12.1483623Z #31 0.554 torch==2.8.0 2025-09-07T06:54:12.1483903Z #31 0.554 torchvision==0.23.0 2025-09-07T06:54:12.1484198Z #31 0.554 torchaudio==2.8.0 2025-09-07T06:54:12.1484553Z #31 0.554 <<< done cleaning requirements/rocm-build.txt 2025-09-07T06:54:12.1484911Z #31 0.554 2025-09-07T06:54:12.1485191Z #31 0.554 >>> cleaning requirements/rocm-test.txt 2025-09-07T06:54:12.1485603Z #31 0.554 <<< done cleaning requirements/rocm-test.txt 2025-09-07T06:54:12.1485975Z #31 0.554 2025-09-07T06:54:12.1486244Z #31 0.554 >>> cleaning requirements/rocm.txt 2025-09-07T06:54:12.1486630Z #31 0.554 <<< done cleaning requirements/rocm.txt 2025-09-07T06:54:12.1486995Z #31 0.554 2025-09-07T06:54:12.1487295Z #31 0.554 >>> cleaning requirements/test.txt 2025-09-07T06:54:12.1487664Z #31 0.554 removed: 2025-09-07T06:54:12.1488305Z #31 0.554 # uv pip compile requirements/test.in -o requirements/test.txt --index-strategy unsafe-best-match --torch-backend cu128 2025-09-07T06:54:12.1489073Z #31 0.554 # via terratorch 2025-09-07T06:54:12.1489353Z #31 0.554 # via terratorch 2025-09-07T06:54:12.1489673Z #31 0.554 efficientnet-pytorch==0.7.1 2025-09-07T06:54:12.1490046Z #31 0.554 # via segmentation-models-pytorch 2025-09-07T06:54:12.1490451Z #31 0.554 # terratorch 2025-09-07T06:54:12.1490734Z #31 0.554 # torchgeo 2025-09-07T06:54:12.1491017Z #31 0.554 # vector-quantize-pytorch 2025-09-07T06:54:12.1491376Z #31 0.554 # via vector-quantize-pytorch 2025-09-07T06:54:12.1491696Z #31 0.554 # torch 2025-09-07T06:54:12.1492222Z #31 0.554 # via torchgeo 2025-09-07T06:54:12.1492846Z #31 0.554 # pytorch-lightning 2025-09-07T06:54:12.1493200Z #31 0.554 # torch 2025-09-07T06:54:12.1493464Z #31 0.554 # via open-clip-torch 2025-09-07T06:54:12.1493791Z #31 0.554 # via terratorch 2025-09-07T06:54:12.1494074Z #31 0.554 # via terratorch 2025-09-07T06:54:12.1494384Z #31 0.554 # open-clip-torch 2025-09-07T06:54:12.1494733Z #31 0.554 # segmentation-models-pytorch 2025-09-07T06:54:12.1495079Z #31 0.554 # terratorch 2025-09-07T06:54:12.1495353Z #31 0.554 # torch 2025-09-07T06:54:12.1495593Z #31 0.554 # terratorch 2025-09-07T06:54:12.1495877Z #31 0.554 # via torchgeo 2025-09-07T06:54:12.1496143Z #31 0.554 # terratorch 2025-09-07T06:54:12.1496440Z #31 0.554 # torchgeo 2025-09-07T06:54:12.1496694Z #31 0.554 # terratorch 2025-09-07T06:54:12.1496973Z #31 0.554 # torchgeo 2025-09-07T06:54:12.1497241Z #31 0.554 # pytorch-lightning 2025-09-07T06:54:12.1497562Z #31 0.554 # torchmetrics 2025-09-07T06:54:12.1497853Z #31 0.554 # torchgeo 2025-09-07T06:54:12.1498116Z #31 0.554 # via terratorch 2025-09-07T06:54:12.1498421Z #31 0.554 # torch 2025-09-07T06:54:12.1498806Z #31 0.554 # segmentation-models-pytorch 2025-09-07T06:54:12.1499170Z #31 0.554 # torchgeo 2025-09-07T06:54:12.1499433Z #31 0.554 # torchmetrics 2025-09-07T06:54:12.1499724Z #31 0.554 # torchvision 2025-09-07T06:54:12.1499997Z #31 0.554 # torch 2025-09-07T06:54:12.1500251Z #31 0.554 # via torch 2025-09-07T06:54:12.1500503Z #31 0.554 # via torch 2025-09-07T06:54:12.1500767Z #31 0.554 # via torch 2025-09-07T06:54:12.1501019Z #31 0.554 # via torch 2025-09-07T06:54:12.1501284Z #31 0.554 # via torch 2025-09-07T06:54:12.1501545Z #31 0.554 # via torch 2025-09-07T06:54:12.1501796Z #31 0.554 # via torch 2025-09-07T06:54:12.1502060Z #31 0.554 # via torch 2025-09-07T06:54:12.1502307Z #31 0.554 # torch 2025-09-07T06:54:12.1502562Z #31 0.554 # via torch 2025-09-07T06:54:12.1502814Z #31 0.554 # via torch 2025-09-07T06:54:12.1503074Z #31 0.554 # torch 2025-09-07T06:54:12.1503314Z #31 0.554 # via torch 2025-09-07T06:54:12.1503603Z #31 0.554 open-clip-torch==2.32.0 2025-09-07T06:54:12.1504007Z #31 0.554 # pytorch-lightning 2025-09-07T06:54:12.1504463Z #31 0.554 # torchmetrics 2025-09-07T06:54:12.1504729Z #31 0.554 # torchgeo 2025-09-07T06:54:12.1505032Z #31 0.554 # segmentation-models-pytorch 2025-09-07T06:54:12.1505385Z #31 0.554 # torchgeo 2025-09-07T06:54:12.1505637Z #31 0.554 # torchvision 2025-09-07T06:54:12.1505958Z #31 0.554 # via segmentation-models-pytorch 2025-09-07T06:54:12.1506305Z #31 0.554 # via terratorch 2025-09-07T06:54:12.1506582Z #31 0.554 # torchgeo 2025-09-07T06:54:12.1506829Z #31 0.554 # terratorch 2025-09-07T06:54:12.1507115Z #31 0.554 # via terratorch 2025-09-07T06:54:12.1507404Z #31 0.554 pytorch-lightning==2.5.2 2025-09-07T06:54:12.1507745Z #31 0.554 # pytorch-lightning 2025-09-07T06:54:12.1508037Z #31 0.554 # terratorch 2025-09-07T06:54:12.1508309Z #31 0.554 # torchgeo 2025-09-07T06:54:12.1508584Z #31 0.554 # open-clip-torch 2025-09-07T06:54:12.1508872Z #31 0.554 # via terratorch 2025-09-07T06:54:12.1509156Z #31 0.554 # via torchgeo 2025-09-07T06:54:12.1509430Z #31 0.554 # open-clip-torch 2025-09-07T06:54:12.1509773Z #31 0.554 segmentation-models-pytorch==0.4.0 2025-09-07T06:54:12.1510121Z #31 0.554 # terratorch 2025-09-07T06:54:12.1510388Z #31 0.554 # torchgeo 2025-09-07T06:54:12.1510739Z #31 0.554 # torch 2025-09-07T06:54:12.1510989Z #31 0.554 # torchgeo 2025-09-07T06:54:12.1511268Z #31 0.554 # segmentation-models-pytorch 2025-09-07T06:54:12.1511610Z #31 0.554 # torch 2025-09-07T06:54:12.1511873Z #31 0.554 terratorch==1.1rc3 2025-09-07T06:54:12.1512153Z #31 0.554 # terratorch 2025-09-07T06:54:12.1512493Z #31 0.554 # open-clip-torch 2025-09-07T06:54:12.1512808Z #31 0.554 # segmentation-models-pytorch 2025-09-07T06:54:12.1513167Z #31 0.554 # terratorch 2025-09-07T06:54:12.1513425Z #31 0.554 # torchgeo 2025-09-07T06:54:12.1513700Z #31 0.554 torch==2.8.0+cu128 2025-09-07T06:54:12.1513996Z #31 0.554 # efficientnet-pytorch 2025-09-07T06:54:12.1514333Z #31 0.554 # open-clip-torch 2025-09-07T06:54:12.1514635Z #31 0.554 # pytorch-lightning 2025-09-07T06:54:12.1514975Z #31 0.554 # segmentation-models-pytorch 2025-09-07T06:54:12.1515326Z #31 0.554 # terratorch 2025-09-07T06:54:12.1515585Z #31 0.554 # torchaudio 2025-09-07T06:54:12.1515855Z #31 0.554 # torchgeo 2025-09-07T06:54:12.1516110Z #31 0.554 # torchmetrics 2025-09-07T06:54:12.1516393Z #31 0.554 # torchvision 2025-09-07T06:54:12.1516697Z #31 0.554 # vector-quantize-pytorch 2025-09-07T06:54:12.1517041Z #31 0.554 torchaudio==2.8.0+cu128 2025-09-07T06:54:12.1517345Z #31 0.554 torchgeo==0.7.0 2025-09-07T06:54:12.1517637Z #31 0.554 # via terratorch 2025-09-07T06:54:12.1517924Z #31 0.554 torchmetrics==1.7.4 2025-09-07T06:54:12.1518240Z #31 0.554 # pytorch-lightning 2025-09-07T06:54:12.1518534Z #31 0.554 # terratorch 2025-09-07T06:54:12.1518809Z #31 0.554 # torchgeo 2025-09-07T06:54:12.1519097Z #31 0.554 torchvision==0.23.0+cu128 2025-09-07T06:54:12.1519420Z #31 0.554 # open-clip-torch 2025-09-07T06:54:12.1519755Z #31 0.554 # segmentation-models-pytorch 2025-09-07T06:54:12.1520141Z #31 0.554 # terratorch 2025-09-07T06:54:12.1520414Z #31 0.554 # torchgeo 2025-09-07T06:54:12.1520677Z #31 0.554 # open-clip-torch 2025-09-07T06:54:12.1520994Z #31 0.554 # pytorch-lightning 2025-09-07T06:54:12.1521320Z #31 0.554 # segmentation-models-pytorch 2025-09-07T06:54:12.1521674Z #31 0.554 # via torch 2025-09-07T06:54:12.1521938Z #31 0.554 # pytorch-lightning 2025-09-07T06:54:12.1522241Z #31 0.554 # torch 2025-09-07T06:54:12.1522496Z #31 0.554 # torchgeo 2025-09-07T06:54:12.1522785Z #31 0.554 vector-quantize-pytorch==1.21.2 2025-09-07T06:54:12.1523190Z #31 0.554 <<< done cleaning requirements/test.txt 2025-09-07T06:54:12.1523538Z #31 0.554 2025-09-07T06:54:12.1523813Z #31 0.554 >>> cleaning requirements/tpu.txt 2025-09-07T06:54:12.1524142Z #31 0.554 removed: 2025-09-07T06:54:12.1524399Z #31 0.554 # Install torch_xla 2025-09-07T06:54:12.1524846Z #31 0.554 --extra-index-url https://download.pytorch.org/whl/nightly/cpu 2025-09-07T06:54:12.1525377Z #31 0.554 torch==2.9.0.dev20250730 2025-09-07T06:54:12.1525701Z #31 0.554 torchvision==0.24.0.dev20250730 2025-09-07T06:54:12.1526705Z #31 0.554 torch_xla[tpu, pallas] @ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.9.0.dev20250730-cp311-cp311-linux_x86_64.whl ; python_version == "3.11" 2025-09-07T06:54:12.1528333Z #31 0.554 torch_xla[tpu, pallas] @ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.9.0.dev20250730-cp312-cp312-linux_x86_64.whl ; python_version == "3.12" 2025-09-07T06:54:12.1529333Z #31 0.554 <<< done cleaning requirements/tpu.txt 2025-09-07T06:54:12.1529699Z #31 0.554 2025-09-07T06:54:12.1530006Z #31 0.554 >>> cleaning requirements/xpu.txt 2025-09-07T06:54:12.1530351Z #31 0.554 removed: 2025-09-07T06:54:12.1530721Z #31 0.554 --extra-index-url=https://download.pytorch.org/whl/xpu 2025-09-07T06:54:12.1531155Z #31 0.554 torch==2.8.0+xpu 2025-09-07T06:54:12.1531439Z #31 0.554 torchaudio 2025-09-07T06:54:12.1531689Z #31 0.554 torchvision 2025-09-07T06:54:12.1531982Z #31 0.554 pytorch-triton-xpu 2025-09-07T06:54:12.1532636Z #31 0.554 --extra-index-url=https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ 2025-09-07T06:54:12.1533463Z #31 0.554 intel-extension-for-pytorch==2.8.10+xpu 2025-09-07T06:54:12.1533962Z #31 0.554 <<< done cleaning requirements/xpu.txt 2025-09-07T06:54:12.1534327Z #31 0.554 2025-09-07T06:54:12.1534585Z #31 0.554 >>> cleaning pyproject.toml 2025-09-07T06:54:12.1534905Z #31 0.554 removed: 2025-09-07T06:54:12.1535166Z #31 0.554 "torch == 2.8.0", 2025-09-07T06:54:12.1535481Z #31 0.554 <<< done cleaning pyproject.toml 2025-09-07T06:54:12.1535868Z #31 0.554 2025-09-07T06:54:12.3102106Z #31 DONE 0.6s 2025-09-07T06:54:12.3102609Z 2025-09-07T06:54:12.3103708Z #32 [build 3/7] RUN --mount=type=cache,target=/root/.cache/uv uv pip install --system -r requirements/build.txt 2025-09-07T06:54:12.9714921Z #32 0.812 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T06:54:13.1746900Z #32 0.863 Resolved 11 packages in 40ms 2025-09-07T06:54:13.1747346Z #32 0.865 Downloading cmake (28.3MiB) 2025-09-07T06:54:13.5237979Z #32 1.365 Downloading cmake 2025-09-07T06:54:13.6741177Z #32 1.365 Prepared 2 packages in 501ms 2025-09-07T06:54:13.7694232Z #32 1.610 Installed 2 packages in 245ms 2025-09-07T06:54:13.7694680Z #32 1.610 + cmake==4.1.0 2025-09-07T06:54:13.7694978Z #32 1.610 + setuptools-scm==9.2.0 2025-09-07T06:54:14.2432103Z #32 DONE 2.1s 2025-09-07T06:54:14.3950368Z 2025-09-07T06:54:14.3951730Z #33 [build 4/7] RUN --mount=type=bind,source=.git,target=.git if [ "0" != "0" ]; then bash tools/check_repo.sh ; fi 2025-09-07T06:54:14.7110801Z #33 DONE 0.5s 2025-09-07T06:54:14.8630739Z 2025-09-07T06:54:14.8634752Z #34 [build 5/7] RUN --mount=type=cache,target=/root/.cache/uv --mount=type=bind,source=.git,target=.git if [ "1" = "1" ]; then echo "Installing sccache..." && curl -L -o sccache.tar.gz https://github.com/mozilla/sccache/releases/download/v0.8.1/sccache-v0.8.1-x86_64-unknown-linux-musl.tar.gz && tar -xzf sccache.tar.gz && sudo mv sccache-v0.8.1-x86_64-unknown-linux-musl/sccache /usr/bin/sccache && rm -rf sccache.tar.gz sccache-v0.8.1-x86_64-unknown-linux-musl && export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 && export SCCACHE_REGION=us-east-1 && export SCCACHE_S3_NO_CREDENTIALS=0 && export SCCACHE_IDLE_TIMEOUT=0 && export CMAKE_BUILD_TYPE=Release && export VLLM_DOCKER_BUILD_CONTEXT=1 && sccache --show-stats && python3 setup.py bdist_wheel --dist-dir=vllm-dist --py-limited-api=cp38 && sccache --show-stats; fi 2025-09-07T06:54:15.7377659Z #34 1.026 Installing sccache... 2025-09-07T06:54:15.8634861Z #34 1.032 % Total % Received % Xferd Average Speed Time Time Time Current 2025-09-07T06:54:15.8635496Z #34 1.032 Dload Upload Total Spent Left Speed 2025-09-07T06:54:15.8635973Z #34 1.032 2025-09-07T06:54:15.8636522Z 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 2025-09-07T06:54:15.8636950Z 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 2025-09-07T06:54:16.0515748Z #34 1.189 2025-09-07T06:54:16.0516206Z 100 9113k 100 9113k 0 0 56.5M 0 --:--:-- --:--:-- --:--:-- 56.5M 2025-09-07T06:54:16.1509130Z #34 1.439 Compile requests 0 2025-09-07T06:54:16.1509611Z #34 1.439 Compile requests executed 0 2025-09-07T06:54:16.1510022Z #34 1.439 Cache hits 0 2025-09-07T06:54:16.1510407Z #34 1.439 Cache misses 0 2025-09-07T06:54:16.1510805Z #34 1.439 Cache timeouts 0 2025-09-07T06:54:16.1511327Z #34 1.439 Cache read errors 0 2025-09-07T06:54:16.1511692Z #34 1.439 Forced recaches 0 2025-09-07T06:54:16.1512069Z #34 1.439 Cache write errors 0 2025-09-07T06:54:16.1512455Z #34 1.439 Compilation failures 0 2025-09-07T06:54:16.1512832Z #34 1.439 Cache errors 0 2025-09-07T06:54:16.1513223Z #34 1.439 Non-cacheable compilations 0 2025-09-07T06:54:16.1513603Z #34 1.439 Non-cacheable calls 0 2025-09-07T06:54:16.1514172Z #34 1.439 Non-compilation calls 0 2025-09-07T06:54:16.1514555Z #34 1.439 Unsupported compiler calls 0 2025-09-07T06:54:16.1514955Z #34 1.439 Average cache write 0.000 s 2025-09-07T06:54:16.1515345Z #34 1.439 Average compiler 0.000 s 2025-09-07T06:54:16.1515815Z #34 1.439 Average cache read hit 0.000 s 2025-09-07T06:54:16.1516225Z #34 1.439 Failed distributed compilations 0 2025-09-07T06:54:16.1516762Z #34 1.439 Cache location s3, name: ossci-compiler-cache-circleci-v2, prefix: / 2025-09-07T06:54:16.1517295Z #34 1.439 Version (client) 0.8.1 2025-09-07T06:54:18.0350195Z #34 3.323 W0907 06:54:18.033000 70 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/torch/utils/cpp_extension.py:117] No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' 2025-09-07T06:54:18.2973173Z #34 3.585 /opt/python/cp312-cp312/lib/python3.12/site-packages/setuptools_scm/_integration/version_inference.py:51: UserWarning: version of None already set 2025-09-07T06:54:18.5379368Z #34 3.585 warnings.warn(self.message) 2025-09-07T06:54:18.5379825Z #34 3.826 running bdist_wheel 2025-09-07T06:54:18.6380338Z #34 3.871 running build 2025-09-07T06:54:18.6380697Z #34 3.871 running build_py 2025-09-07T06:54:18.6381109Z #34 3.883 creating build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6381727Z #34 3.883 copying vllm/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6382386Z #34 3.884 copying vllm/_custom_ops.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6383027Z #34 3.884 copying vllm/_ipex_ops.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6383879Z #34 3.884 copying vllm/beam_search.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6384544Z #34 3.884 copying vllm/collect_env.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6385335Z #34 3.885 copying vllm/connections.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6385999Z #34 3.885 copying vllm/env_override.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6386609Z #34 3.885 copying vllm/envs.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6387251Z #34 3.885 copying vllm/forward_context.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6387903Z #34 3.886 copying vllm/logger.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6388550Z #34 3.886 copying vllm/logits_process.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6389196Z #34 3.886 copying vllm/logprobs.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6389821Z #34 3.886 copying vllm/outputs.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6390559Z #34 3.886 copying vllm/pooling_params.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6391244Z #34 3.887 copying vllm/sampling_params.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6392096Z #34 3.887 copying vllm/scalar_type.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6392922Z #34 3.887 copying vllm/scripts.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6393570Z #34 3.887 copying vllm/sequence.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6394191Z #34 3.888 copying vllm/tasks.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6394831Z #34 3.888 copying vllm/test_utils.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6395475Z #34 3.888 copying vllm/tracing.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6396094Z #34 3.888 copying vllm/version.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6396733Z #34 3.889 copying vllm/_version.py -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:18.6397349Z #34 3.889 creating build/lib.linux-x86_64-cpython-312/vllm/adapter_commons 2025-09-07T06:54:18.6398116Z #34 3.889 copying vllm/adapter_commons/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/adapter_commons 2025-09-07T06:54:18.6399107Z #34 3.889 copying vllm/adapter_commons/layers.py -> build/lib.linux-x86_64-cpython-312/vllm/adapter_commons 2025-09-07T06:54:18.6400009Z #34 3.889 copying vllm/adapter_commons/models.py -> build/lib.linux-x86_64-cpython-312/vllm/adapter_commons 2025-09-07T06:54:18.6400998Z #34 3.890 copying vllm/adapter_commons/request.py -> build/lib.linux-x86_64-cpython-312/vllm/adapter_commons 2025-09-07T06:54:18.6401894Z #34 3.890 copying vllm/adapter_commons/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/adapter_commons 2025-09-07T06:54:18.6402844Z #34 3.890 copying vllm/adapter_commons/worker_manager.py -> build/lib.linux-x86_64-cpython-312/vllm/adapter_commons 2025-09-07T06:54:18.6403624Z #34 3.890 creating build/lib.linux-x86_64-cpython-312/vllm/assets 2025-09-07T06:54:18.6404372Z #34 3.890 copying vllm/assets/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/assets 2025-09-07T06:54:18.6405100Z #34 3.891 copying vllm/assets/audio.py -> build/lib.linux-x86_64-cpython-312/vllm/assets 2025-09-07T06:54:18.6405808Z #34 3.891 copying vllm/assets/base.py -> build/lib.linux-x86_64-cpython-312/vllm/assets 2025-09-07T06:54:18.6406524Z #34 3.891 copying vllm/assets/image.py -> build/lib.linux-x86_64-cpython-312/vllm/assets 2025-09-07T06:54:18.6407233Z #34 3.891 copying vllm/assets/video.py -> build/lib.linux-x86_64-cpython-312/vllm/assets 2025-09-07T06:54:18.6407878Z #34 3.892 creating build/lib.linux-x86_64-cpython-312/vllm/attention 2025-09-07T06:54:18.6408543Z #34 3.892 copying vllm/attention/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/attention 2025-09-07T06:54:18.6409301Z #34 3.892 copying vllm/attention/layer.py -> build/lib.linux-x86_64-cpython-312/vllm/attention 2025-09-07T06:54:18.6410150Z #34 3.892 copying vllm/attention/selector.py -> build/lib.linux-x86_64-cpython-312/vllm/attention 2025-09-07T06:54:18.6410826Z #34 3.892 creating build/lib.linux-x86_64-cpython-312/vllm/benchmarks 2025-09-07T06:54:18.6411508Z #34 3.893 copying vllm/benchmarks/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/benchmarks 2025-09-07T06:54:18.6412324Z #34 3.893 copying vllm/benchmarks/datasets.py -> build/lib.linux-x86_64-cpython-312/vllm/benchmarks 2025-09-07T06:54:18.6413421Z #34 3.893 copying vllm/benchmarks/latency.py -> build/lib.linux-x86_64-cpython-312/vllm/benchmarks 2025-09-07T06:54:18.6414257Z #34 3.893 copying vllm/benchmarks/serve.py -> build/lib.linux-x86_64-cpython-312/vllm/benchmarks 2025-09-07T06:54:18.6415103Z #34 3.894 copying vllm/benchmarks/throughput.py -> build/lib.linux-x86_64-cpython-312/vllm/benchmarks 2025-09-07T06:54:18.6415853Z #34 3.894 creating build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6416633Z #34 3.894 copying vllm/compilation/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6417565Z #34 3.894 copying vllm/compilation/activation_quant_fusion.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6418517Z #34 3.894 copying vllm/compilation/backends.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6419422Z #34 3.895 copying vllm/compilation/base_static_graph.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6420383Z #34 3.895 copying vllm/compilation/collective_fusion.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6421362Z #34 3.895 copying vllm/compilation/compiler_interface.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6422275Z #34 3.895 copying vllm/compilation/counter.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6423156Z #34 3.896 copying vllm/compilation/cuda_graph.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6424092Z #34 3.896 copying vllm/compilation/cuda_piecewise_backend.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6425156Z #34 3.896 copying vllm/compilation/decorators.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6426128Z #34 3.896 copying vllm/compilation/fix_functionalization.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6427025Z #34 3.896 copying vllm/compilation/fusion.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6427872Z #34 3.897 copying vllm/compilation/fusion_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6428737Z #34 3.897 copying vllm/compilation/fx_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6429600Z #34 3.897 copying vllm/compilation/inductor_pass.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6430508Z #34 3.897 copying vllm/compilation/monitor.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6431398Z #34 3.898 copying vllm/compilation/multi_output_match.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6432313Z #34 3.898 copying vllm/compilation/noop_elimination.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6433223Z #34 3.898 copying vllm/compilation/pass_manager.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6434159Z #34 3.898 copying vllm/compilation/sequence_parallelism.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6435136Z #34 3.898 copying vllm/compilation/torch25_custom_graph_pass.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6436112Z #34 3.899 copying vllm/compilation/vllm_inductor_pass.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6436982Z #34 3.899 copying vllm/compilation/wrapper.py -> build/lib.linux-x86_64-cpython-312/vllm/compilation 2025-09-07T06:54:18.6437700Z #34 3.899 creating build/lib.linux-x86_64-cpython-312/vllm/config 2025-09-07T06:54:18.6438309Z #34 3.899 copying vllm/config/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/config 2025-09-07T06:54:18.6439028Z #34 3.900 copying vllm/config/cache.py -> build/lib.linux-x86_64-cpython-312/vllm/config 2025-09-07T06:54:18.6439786Z #34 3.900 copying vllm/config/compilation.py -> build/lib.linux-x86_64-cpython-312/vllm/config 2025-09-07T06:54:18.6440548Z #34 3.900 copying vllm/config/parallel.py -> build/lib.linux-x86_64-cpython-312/vllm/config 2025-09-07T06:54:18.6441313Z #34 3.900 copying vllm/config/scheduler.py -> build/lib.linux-x86_64-cpython-312/vllm/config 2025-09-07T06:54:18.6442040Z #34 3.900 copying vllm/config/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/config 2025-09-07T06:54:18.6442660Z #34 3.901 creating build/lib.linux-x86_64-cpython-312/vllm/core 2025-09-07T06:54:18.6443248Z #34 3.901 copying vllm/core/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/core 2025-09-07T06:54:18.6443983Z #34 3.901 copying vllm/core/block_manager.py -> build/lib.linux-x86_64-cpython-312/vllm/core 2025-09-07T06:54:18.6444709Z #34 3.901 copying vllm/core/evictor.py -> build/lib.linux-x86_64-cpython-312/vllm/core 2025-09-07T06:54:18.6445422Z #34 3.902 copying vllm/core/interfaces.py -> build/lib.linux-x86_64-cpython-312/vllm/core 2025-09-07T06:54:18.6446277Z #34 3.902 copying vllm/core/placeholder_block_space_manager.py -> build/lib.linux-x86_64-cpython-312/vllm/core 2025-09-07T06:54:18.6447108Z #34 3.902 copying vllm/core/scheduler.py -> build/lib.linux-x86_64-cpython-312/vllm/core 2025-09-07T06:54:18.6447791Z #34 3.902 creating build/lib.linux-x86_64-cpython-312/vllm/device_allocator 2025-09-07T06:54:18.6448546Z #34 3.902 copying vllm/device_allocator/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/device_allocator 2025-09-07T06:54:18.6449447Z #34 3.903 copying vllm/device_allocator/cumem.py -> build/lib.linux-x86_64-cpython-312/vllm/device_allocator 2025-09-07T06:54:18.6450172Z #34 3.903 creating build/lib.linux-x86_64-cpython-312/vllm/distributed 2025-09-07T06:54:18.6450870Z #34 3.903 copying vllm/distributed/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed 2025-09-07T06:54:18.6451736Z #34 3.903 copying vllm/distributed/communication_op.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed 2025-09-07T06:54:18.6452750Z #34 3.903 copying vllm/distributed/kv_events.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed 2025-09-07T06:54:18.6453830Z #34 3.904 copying vllm/distributed/parallel_state.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed 2025-09-07T06:54:18.6454827Z #34 3.904 copying vllm/distributed/tpu_distributed_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed 2025-09-07T06:54:18.6455751Z #34 3.904 copying vllm/distributed/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed 2025-09-07T06:54:18.6456434Z #34 3.904 creating build/lib.linux-x86_64-cpython-312/vllm/engine 2025-09-07T06:54:18.6457084Z #34 3.905 copying vllm/engine/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/engine 2025-09-07T06:54:18.6457847Z #34 3.905 copying vllm/engine/arg_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/engine 2025-09-07T06:54:18.6458641Z #34 3.905 copying vllm/engine/async_llm_engine.py -> build/lib.linux-x86_64-cpython-312/vllm/engine 2025-09-07T06:54:18.6459474Z #34 3.905 copying vllm/engine/async_timeout.py -> build/lib.linux-x86_64-cpython-312/vllm/engine 2025-09-07T06:54:18.6460259Z #34 3.906 copying vllm/engine/llm_engine.py -> build/lib.linux-x86_64-cpython-312/vllm/engine 2025-09-07T06:54:18.6461034Z #34 3.906 copying vllm/engine/metrics.py -> build/lib.linux-x86_64-cpython-312/vllm/engine 2025-09-07T06:54:18.6461828Z #34 3.906 copying vllm/engine/metrics_types.py -> build/lib.linux-x86_64-cpython-312/vllm/engine 2025-09-07T06:54:18.6462611Z #34 3.906 copying vllm/engine/protocol.py -> build/lib.linux-x86_64-cpython-312/vllm/engine 2025-09-07T06:54:18.6463303Z #34 3.907 creating build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T06:54:18.6464058Z #34 3.907 copying vllm/entrypoints/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T06:54:18.6465033Z #34 3.907 copying vllm/entrypoints/api_server.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T06:54:18.6465895Z #34 3.907 copying vllm/entrypoints/chat_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T06:54:18.6466746Z #34 3.908 copying vllm/entrypoints/constants.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T06:54:18.6467596Z #34 3.908 copying vllm/entrypoints/context.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T06:54:18.6468449Z #34 3.908 copying vllm/entrypoints/harmony_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T06:54:18.6469313Z #34 3.908 copying vllm/entrypoints/launcher.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T06:54:18.6470164Z #34 3.908 copying vllm/entrypoints/llm.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T06:54:18.6470966Z #34 3.909 copying vllm/entrypoints/logger.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T06:54:18.6471808Z #34 3.909 copying vllm/entrypoints/renderer.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T06:54:18.6472660Z #34 3.909 copying vllm/entrypoints/score_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T06:54:18.6473484Z #34 3.909 copying vllm/entrypoints/ssl.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T06:54:18.6474286Z #34 3.910 copying vllm/entrypoints/tool.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T06:54:18.6475107Z #34 3.910 copying vllm/entrypoints/tool_server.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T06:54:18.6475947Z #34 3.910 copying vllm/entrypoints/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints 2025-09-07T06:54:18.6476625Z #34 3.910 creating build/lib.linux-x86_64-cpython-312/vllm/executor 2025-09-07T06:54:18.6477277Z #34 3.910 copying vllm/executor/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/executor 2025-09-07T06:54:18.6478058Z #34 3.911 copying vllm/executor/executor_base.py -> build/lib.linux-x86_64-cpython-312/vllm/executor 2025-09-07T06:54:18.6478972Z #34 3.911 copying vllm/executor/mp_distributed_executor.py -> build/lib.linux-x86_64-cpython-312/vllm/executor 2025-09-07T06:54:18.6479854Z #34 3.911 copying vllm/executor/msgspec_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/executor 2025-09-07T06:54:18.6480749Z #34 3.911 copying vllm/executor/multiproc_worker_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/executor 2025-09-07T06:54:18.6481675Z #34 3.911 copying vllm/executor/ray_distributed_executor.py -> build/lib.linux-x86_64-cpython-312/vllm/executor 2025-09-07T06:54:18.6482521Z #34 3.912 copying vllm/executor/ray_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/executor 2025-09-07T06:54:18.6483344Z #34 3.912 copying vllm/executor/uniproc_executor.py -> build/lib.linux-x86_64-cpython-312/vllm/executor 2025-09-07T06:54:18.6484047Z #34 3.912 creating build/lib.linux-x86_64-cpython-312/vllm/inputs 2025-09-07T06:54:18.6484655Z #34 3.912 copying vllm/inputs/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/inputs 2025-09-07T06:54:18.6485371Z #34 3.913 copying vllm/inputs/data.py -> build/lib.linux-x86_64-cpython-312/vllm/inputs 2025-09-07T06:54:18.6486078Z #34 3.913 copying vllm/inputs/parse.py -> build/lib.linux-x86_64-cpython-312/vllm/inputs 2025-09-07T06:54:18.6486835Z #34 3.913 copying vllm/inputs/preprocess.py -> build/lib.linux-x86_64-cpython-312/vllm/inputs 2025-09-07T06:54:18.6487606Z #34 3.913 copying vllm/inputs/registry.py -> build/lib.linux-x86_64-cpython-312/vllm/inputs 2025-09-07T06:54:18.6488272Z #34 3.913 creating build/lib.linux-x86_64-cpython-312/vllm/logging_utils 2025-09-07T06:54:18.6488986Z #34 3.914 copying vllm/logging_utils/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/logging_utils 2025-09-07T06:54:18.6489860Z #34 3.914 copying vllm/logging_utils/dump_input.py -> build/lib.linux-x86_64-cpython-312/vllm/logging_utils 2025-09-07T06:54:18.6490742Z #34 3.914 copying vllm/logging_utils/formatter.py -> build/lib.linux-x86_64-cpython-312/vllm/logging_utils 2025-09-07T06:54:18.6491441Z #34 3.914 creating build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T06:54:18.6492155Z #34 3.914 copying vllm/lora/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T06:54:18.6493167Z #34 3.915 copying vllm/lora/fully_sharded_layers.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T06:54:18.6493938Z #34 3.915 copying vllm/lora/layers.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T06:54:18.6494646Z #34 3.915 copying vllm/lora/lora.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T06:54:18.6495334Z #34 3.915 copying vllm/lora/models.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T06:54:18.6496133Z #34 3.916 copying vllm/lora/peft_helper.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T06:54:18.6496873Z #34 3.916 copying vllm/lora/request.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T06:54:18.6497589Z #34 3.916 copying vllm/lora/resolver.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T06:54:18.6498313Z #34 3.916 copying vllm/lora/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T06:54:18.6499042Z #34 3.916 copying vllm/lora/worker_manager.py -> build/lib.linux-x86_64-cpython-312/vllm/lora 2025-09-07T06:54:18.6499750Z #34 3.917 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor 2025-09-07T06:54:18.6500508Z #34 3.917 copying vllm/model_executor/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor 2025-09-07T06:54:18.6501393Z #34 3.917 copying vllm/model_executor/custom_op.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor 2025-09-07T06:54:18.6502314Z #34 3.917 copying vllm/model_executor/parameter.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor 2025-09-07T06:54:18.6503269Z #34 3.917 copying vllm/model_executor/sampling_metadata.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor 2025-09-07T06:54:18.6504209Z #34 3.918 copying vllm/model_executor/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor 2025-09-07T06:54:18.6505100Z #34 3.918 creating build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T06:54:18.6505763Z #34 3.918 copying vllm/multimodal/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T06:54:18.6506556Z #34 3.918 copying vllm/multimodal/audio.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T06:54:18.6507376Z #34 3.919 copying vllm/multimodal/base.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T06:54:18.6508161Z #34 3.919 copying vllm/multimodal/cache.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T06:54:18.6508954Z #34 3.919 copying vllm/multimodal/hasher.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T06:54:18.6509738Z #34 3.919 copying vllm/multimodal/image.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T06:54:18.6510531Z #34 3.919 copying vllm/multimodal/inputs.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T06:54:18.6511315Z #34 3.920 copying vllm/multimodal/parse.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T06:54:18.6512143Z #34 3.920 copying vllm/multimodal/processing.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T06:54:18.6512986Z #34 3.920 copying vllm/multimodal/profiling.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T06:54:18.6513827Z #34 3.920 copying vllm/multimodal/registry.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T06:54:18.6514642Z #34 3.921 copying vllm/multimodal/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T06:54:18.6515414Z #34 3.921 copying vllm/multimodal/video.py -> build/lib.linux-x86_64-cpython-312/vllm/multimodal 2025-09-07T06:54:18.6516132Z #34 3.921 creating build/lib.linux-x86_64-cpython-312/vllm/platforms 2025-09-07T06:54:18.6516794Z #34 3.921 copying vllm/platforms/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/platforms 2025-09-07T06:54:18.6517554Z #34 3.921 copying vllm/platforms/cpu.py -> build/lib.linux-x86_64-cpython-312/vllm/platforms 2025-09-07T06:54:18.6518319Z #34 3.922 copying vllm/platforms/cuda.py -> build/lib.linux-x86_64-cpython-312/vllm/platforms 2025-09-07T06:54:18.6519099Z #34 3.922 copying vllm/platforms/interface.py -> build/lib.linux-x86_64-cpython-312/vllm/platforms 2025-09-07T06:54:18.6519891Z #34 3.922 copying vllm/platforms/rocm.py -> build/lib.linux-x86_64-cpython-312/vllm/platforms 2025-09-07T06:54:18.6520634Z #34 3.922 copying vllm/platforms/tpu.py -> build/lib.linux-x86_64-cpython-312/vllm/platforms 2025-09-07T06:54:18.6521384Z #34 3.923 copying vllm/platforms/xpu.py -> build/lib.linux-x86_64-cpython-312/vllm/platforms 2025-09-07T06:54:18.6522059Z #34 3.923 creating build/lib.linux-x86_64-cpython-312/vllm/plugins 2025-09-07T06:54:18.6522687Z #34 3.923 copying vllm/plugins/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/plugins 2025-09-07T06:54:18.6523334Z #34 3.923 creating build/lib.linux-x86_64-cpython-312/vllm/profiler 2025-09-07T06:54:18.6523976Z #34 3.923 copying vllm/profiler/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/profiler 2025-09-07T06:54:18.6524801Z #34 3.924 copying vllm/profiler/layerwise_profile.py -> build/lib.linux-x86_64-cpython-312/vllm/profiler 2025-09-07T06:54:18.6525607Z #34 3.924 copying vllm/profiler/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/profiler 2025-09-07T06:54:18.6526244Z #34 3.924 creating build/lib.linux-x86_64-cpython-312/vllm/ray 2025-09-07T06:54:18.6526820Z #34 3.924 copying vllm/ray/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/ray 2025-09-07T06:54:18.6527490Z #34 3.924 copying vllm/ray/lazy_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/ray 2025-09-07T06:54:18.6528181Z #34 3.925 copying vllm/ray/ray_env.py -> build/lib.linux-x86_64-cpython-312/vllm/ray 2025-09-07T06:54:18.6528782Z #34 3.925 creating build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T06:54:18.6529447Z #34 3.925 copying vllm/reasoning/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T06:54:18.6530328Z #34 3.925 copying vllm/reasoning/abs_reasoning_parsers.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T06:54:18.6531287Z #34 3.925 copying vllm/reasoning/deepseek_r1_reasoning_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T06:54:18.6532272Z #34 3.926 copying vllm/reasoning/glm4_moe_reasoning_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T06:54:18.6533525Z #34 3.926 copying vllm/reasoning/gptoss_reasoning_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T06:54:18.6534520Z #34 3.926 copying vllm/reasoning/granite_reasoning_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T06:54:18.7382748Z #34 3.926 copying vllm/reasoning/hunyuan_a13b_reasoning_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T06:54:18.7383799Z #34 3.926 copying vllm/reasoning/mistral_reasoning_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T06:54:18.7384929Z #34 3.927 copying vllm/reasoning/qwen3_reasoning_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T06:54:18.7385922Z #34 3.927 copying vllm/reasoning/step3_reasoning_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/reasoning 2025-09-07T06:54:18.7386688Z #34 3.927 creating build/lib.linux-x86_64-cpython-312/vllm/third_party 2025-09-07T06:54:18.7387374Z #34 3.927 copying vllm/third_party/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/third_party 2025-09-07T06:54:18.7388185Z #34 3.928 copying vllm/third_party/pynvml.py -> build/lib.linux-x86_64-cpython-312/vllm/third_party 2025-09-07T06:54:18.7388900Z #34 3.928 creating build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T06:54:18.7390058Z #34 3.928 copying vllm/transformers_utils/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T06:54:18.7391063Z #34 3.928 copying vllm/transformers_utils/config.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T06:54:18.7392259Z #34 3.929 copying vllm/transformers_utils/detokenizer.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T06:54:18.7393339Z #34 3.929 copying vllm/transformers_utils/detokenizer_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T06:54:18.7394417Z #34 3.929 copying vllm/transformers_utils/dynamic_module.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T06:54:18.7395471Z #34 3.929 copying vllm/transformers_utils/processor.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T06:54:18.7396477Z #34 3.930 copying vllm/transformers_utils/s3_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T06:54:18.7397541Z #34 3.930 copying vllm/transformers_utils/tokenizer.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T06:54:18.7398595Z #34 3.930 copying vllm/transformers_utils/tokenizer_base.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T06:54:18.7399660Z #34 3.930 copying vllm/transformers_utils/tokenizer_group.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T06:54:18.7400688Z #34 3.930 copying vllm/transformers_utils/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils 2025-09-07T06:54:18.7401487Z #34 3.931 creating build/lib.linux-x86_64-cpython-312/vllm/triton_utils 2025-09-07T06:54:18.7402195Z #34 3.931 copying vllm/triton_utils/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/triton_utils 2025-09-07T06:54:18.7403060Z #34 3.931 copying vllm/triton_utils/importing.py -> build/lib.linux-x86_64-cpython-312/vllm/triton_utils 2025-09-07T06:54:18.7403758Z #34 3.931 creating build/lib.linux-x86_64-cpython-312/vllm/usage 2025-09-07T06:54:18.7404494Z #34 3.931 copying vllm/usage/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/usage 2025-09-07T06:54:18.7405211Z #34 3.932 copying vllm/usage/usage_lib.py -> build/lib.linux-x86_64-cpython-312/vllm/usage 2025-09-07T06:54:18.7405879Z #34 3.932 creating build/lib.linux-x86_64-cpython-312/vllm/utils 2025-09-07T06:54:18.7406484Z #34 3.932 copying vllm/utils/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/utils 2025-09-07T06:54:18.7407189Z #34 3.932 copying vllm/utils/deep_gemm.py -> build/lib.linux-x86_64-cpython-312/vllm/utils 2025-09-07T06:54:18.7408377Z #34 3.932 copying vllm/utils/flashinfer.py -> build/lib.linux-x86_64-cpython-312/vllm/utils 2025-09-07T06:54:18.7409129Z #34 3.933 copying vllm/utils/jsontree.py -> build/lib.linux-x86_64-cpython-312/vllm/utils 2025-09-07T06:54:18.7409872Z #34 3.933 copying vllm/utils/tensor_schema.py -> build/lib.linux-x86_64-cpython-312/vllm/utils 2025-09-07T06:54:18.7410510Z #34 3.933 creating build/lib.linux-x86_64-cpython-312/vllm/v1 2025-09-07T06:54:18.7411065Z #34 3.933 copying vllm/v1/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1 2025-09-07T06:54:18.7411788Z #34 3.934 copying vllm/v1/cudagraph_dispatcher.py -> build/lib.linux-x86_64-cpython-312/vllm/v1 2025-09-07T06:54:18.7412631Z #34 3.934 copying vllm/v1/kv_cache_interface.py -> build/lib.linux-x86_64-cpython-312/vllm/v1 2025-09-07T06:54:18.7413537Z #34 3.934 copying vllm/v1/outputs.py -> build/lib.linux-x86_64-cpython-312/vllm/v1 2025-09-07T06:54:18.7414226Z #34 3.934 copying vllm/v1/request.py -> build/lib.linux-x86_64-cpython-312/vllm/v1 2025-09-07T06:54:18.7414923Z #34 3.934 copying vllm/v1/serial_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/v1 2025-09-07T06:54:18.7415621Z #34 3.935 copying vllm/v1/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/v1 2025-09-07T06:54:18.7416214Z #34 3.935 creating build/lib.linux-x86_64-cpython-312/vllm/worker 2025-09-07T06:54:18.7416861Z #34 3.935 copying vllm/worker/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/worker 2025-09-07T06:54:18.7417695Z #34 3.935 copying vllm/worker/cache_engine.py -> build/lib.linux-x86_64-cpython-312/vllm/worker 2025-09-07T06:54:18.7418519Z #34 3.936 copying vllm/worker/enc_dec_model_runner.py -> build/lib.linux-x86_64-cpython-312/vllm/worker 2025-09-07T06:54:18.7419354Z #34 3.936 copying vllm/worker/model_runner.py -> build/lib.linux-x86_64-cpython-312/vllm/worker 2025-09-07T06:54:18.7420167Z #34 3.936 copying vllm/worker/model_runner_base.py -> build/lib.linux-x86_64-cpython-312/vllm/worker 2025-09-07T06:54:18.7420964Z #34 3.936 copying vllm/worker/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/worker 2025-09-07T06:54:18.7421720Z #34 3.936 copying vllm/worker/worker.py -> build/lib.linux-x86_64-cpython-312/vllm/worker 2025-09-07T06:54:18.7422480Z #34 3.937 copying vllm/worker/worker_base.py -> build/lib.linux-x86_64-cpython-312/vllm/worker 2025-09-07T06:54:18.7423210Z #34 3.937 creating build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T06:54:18.7424060Z #34 3.937 copying vllm/attention/backends/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T06:54:18.7425179Z #34 3.937 copying vllm/attention/backends/abstract.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T06:54:18.7426267Z #34 3.938 copying vllm/attention/backends/differential_flash_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T06:54:18.7427387Z #34 3.938 copying vllm/attention/backends/dual_chunk_flash_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T06:54:18.7428452Z #34 3.938 copying vllm/attention/backends/flash_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T06:54:18.7429443Z #34 3.938 copying vllm/attention/backends/flashmla.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T06:54:18.7430490Z #34 3.939 copying vllm/attention/backends/placeholder_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T06:54:18.7431558Z #34 3.939 copying vllm/attention/backends/rocm_aiter_mla.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T06:54:18.7432588Z #34 3.939 copying vllm/attention/backends/rocm_flash_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T06:54:18.7433648Z #34 3.939 copying vllm/attention/backends/triton_mla.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T06:54:18.7434618Z #34 3.940 copying vllm/attention/backends/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T06:54:18.7435638Z #34 3.940 copying vllm/attention/backends/xformers.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends 2025-09-07T06:54:18.7436449Z #34 3.940 creating build/lib.linux-x86_64-cpython-312/vllm/attention/layers 2025-09-07T06:54:18.7437210Z #34 3.940 copying vllm/attention/layers/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/layers 2025-09-07T06:54:18.7438226Z #34 3.940 copying vllm/attention/layers/chunked_local_attention.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/layers 2025-09-07T06:54:18.7439316Z #34 3.941 copying vllm/attention/layers/encoder_only_attention.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/layers 2025-09-07T06:54:18.7440165Z #34 3.941 creating build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T06:54:18.7440887Z #34 3.941 copying vllm/attention/ops/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T06:54:18.7441838Z #34 3.941 copying vllm/attention/ops/chunked_prefill_paged_decode.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T06:54:18.7442810Z #34 3.942 copying vllm/attention/ops/common.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T06:54:18.7443674Z #34 3.942 copying vllm/attention/ops/flashmla.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T06:54:18.7444635Z #34 3.942 copying vllm/attention/ops/merge_attn_states.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T06:54:18.7445564Z #34 3.942 copying vllm/attention/ops/paged_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T06:54:18.7446503Z #34 3.942 copying vllm/attention/ops/pallas_kv_cache_update.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T06:54:18.7447478Z #34 3.943 copying vllm/attention/ops/prefix_prefill.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T06:54:18.7448397Z #34 3.943 copying vllm/attention/ops/rocm_aiter_mla.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T06:54:18.7449352Z #34 3.943 copying vllm/attention/ops/rocm_aiter_paged_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T06:54:18.7450378Z #34 3.943 copying vllm/attention/ops/triton_decode_attention.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T06:54:18.7451421Z #34 3.944 copying vllm/attention/ops/triton_flash_attention.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T06:54:18.7452545Z #34 3.944 copying vllm/attention/ops/triton_merge_attn_states.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T06:54:18.7453775Z #34 3.944 copying vllm/attention/ops/triton_unified_attention.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/ops 2025-09-07T06:54:18.7454636Z #34 3.944 creating build/lib.linux-x86_64-cpython-312/vllm/attention/utils 2025-09-07T06:54:18.7455421Z #34 3.944 copying vllm/attention/utils/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/utils 2025-09-07T06:54:18.7456343Z #34 3.945 copying vllm/attention/utils/fa_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/utils 2025-09-07T06:54:18.7457323Z #34 3.945 copying vllm/attention/utils/kv_sharing_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/utils 2025-09-07T06:54:18.7458161Z #34 3.945 creating build/lib.linux-x86_64-cpython-312/vllm/attention/backends/mla 2025-09-07T06:54:18.7459055Z #34 3.945 copying vllm/attention/backends/mla/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends/mla 2025-09-07T06:54:18.7460144Z #34 3.945 copying vllm/attention/backends/mla/common.py -> build/lib.linux-x86_64-cpython-312/vllm/attention/backends/mla 2025-09-07T06:54:18.7461036Z #34 3.946 creating build/lib.linux-x86_64-cpython-312/vllm/benchmarks/lib 2025-09-07T06:54:18.7461800Z #34 3.946 copying vllm/benchmarks/lib/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/benchmarks/lib 2025-09-07T06:54:18.7462773Z #34 3.946 copying vllm/benchmarks/lib/endpoint_request_func.py -> build/lib.linux-x86_64-cpython-312/vllm/benchmarks/lib 2025-09-07T06:54:18.7463829Z #34 3.946 copying vllm/benchmarks/lib/ready_checker.py -> build/lib.linux-x86_64-cpython-312/vllm/benchmarks/lib 2025-09-07T06:54:18.7464875Z #34 3.947 copying vllm/benchmarks/lib/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/benchmarks/lib 2025-09-07T06:54:18.7465582Z #34 3.947 creating build/lib.linux-x86_64-cpython-312/vllm/core/block 2025-09-07T06:54:18.7466265Z #34 3.947 copying vllm/core/block/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/core/block 2025-09-07T06:54:18.7467068Z #34 3.947 copying vllm/core/block/block_table.py -> build/lib.linux-x86_64-cpython-312/vllm/core/block 2025-09-07T06:54:18.7467889Z #34 3.948 copying vllm/core/block/common.py -> build/lib.linux-x86_64-cpython-312/vllm/core/block 2025-09-07T06:54:18.7468760Z #34 3.948 copying vllm/core/block/cpu_gpu_block_allocator.py -> build/lib.linux-x86_64-cpython-312/vllm/core/block 2025-09-07T06:54:18.7469642Z #34 3.948 copying vllm/core/block/interfaces.py -> build/lib.linux-x86_64-cpython-312/vllm/core/block 2025-09-07T06:54:18.7470488Z #34 3.948 copying vllm/core/block/naive_block.py -> build/lib.linux-x86_64-cpython-312/vllm/core/block 2025-09-07T06:54:18.7471362Z #34 3.948 copying vllm/core/block/prefix_caching_block.py -> build/lib.linux-x86_64-cpython-312/vllm/core/block 2025-09-07T06:54:18.7472252Z #34 3.949 copying vllm/core/block/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/core/block 2025-09-07T06:54:18.7473032Z #34 3.949 creating build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T06:54:18.7474017Z #34 3.949 copying vllm/distributed/device_communicators/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T06:54:18.7475260Z #34 3.949 copying vllm/distributed/device_communicators/all2all.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T06:54:18.7476535Z #34 3.950 copying vllm/distributed/device_communicators/all_reduce_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T06:54:18.7477904Z #34 3.950 copying vllm/distributed/device_communicators/base_device_communicator.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T06:54:18.7479313Z #34 3.950 copying vllm/distributed/device_communicators/cpu_communicator.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T06:54:18.7480642Z #34 3.950 copying vllm/distributed/device_communicators/cuda_communicator.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T06:54:18.7481966Z #34 3.951 copying vllm/distributed/device_communicators/cuda_wrapper.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T06:54:18.7483279Z #34 3.951 copying vllm/distributed/device_communicators/custom_all_reduce.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T06:54:18.7484560Z #34 3.951 copying vllm/distributed/device_communicators/pynccl.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T06:54:18.7485845Z #34 3.951 copying vllm/distributed/device_communicators/pynccl_wrapper.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T06:54:18.7487180Z #34 3.951 copying vllm/distributed/device_communicators/quick_all_reduce.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T06:54:18.7488496Z #34 3.952 copying vllm/distributed/device_communicators/ray_communicator.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T06:54:18.7489851Z #34 3.952 copying vllm/distributed/device_communicators/shm_broadcast.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T06:54:18.7491117Z #34 3.952 copying vllm/distributed/device_communicators/symm_mem.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T06:54:18.7492642Z #34 3.952 copying vllm/distributed/device_communicators/tpu_communicator.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T06:54:18.7494192Z #34 3.952 copying vllm/distributed/device_communicators/xpu_communicator.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators 2025-09-07T06:54:18.7495197Z #34 3.953 creating build/lib.linux-x86_64-cpython-312/vllm/distributed/eplb 2025-09-07T06:54:18.7495998Z #34 3.953 copying vllm/distributed/eplb/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/eplb 2025-09-07T06:54:18.7496973Z #34 3.953 copying vllm/distributed/eplb/eplb_state.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/eplb 2025-09-07T06:54:18.7497988Z #34 3.953 copying vllm/distributed/eplb/rebalance_algo.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/eplb 2025-09-07T06:54:18.7499048Z #34 3.954 copying vllm/distributed/eplb/rebalance_execute.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/eplb 2025-09-07T06:54:18.7499927Z #34 3.954 creating build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer 2025-09-07T06:54:18.7500819Z #34 3.954 copying vllm/distributed/kv_transfer/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer 2025-09-07T06:54:18.7502015Z #34 3.954 copying vllm/distributed/kv_transfer/kv_transfer_state.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer 2025-09-07T06:54:18.7503002Z #34 3.955 creating build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector 2025-09-07T06:54:18.7504075Z #34 3.955 copying vllm/distributed/kv_transfer/kv_connector/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector 2025-09-07T06:54:18.7505383Z #34 3.955 copying vllm/distributed/kv_transfer/kv_connector/base.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector 2025-09-07T06:54:18.7506786Z #34 3.955 copying vllm/distributed/kv_transfer/kv_connector/factory.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector 2025-09-07T06:54:18.7508085Z #34 3.955 copying vllm/distributed/kv_transfer/kv_connector/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector 2025-09-07T06:54:18.7509139Z #34 3.956 creating build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T06:54:18.7510282Z #34 3.956 copying vllm/distributed/kv_transfer/kv_lookup_buffer/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T06:54:18.7511630Z #34 3.956 copying vllm/distributed/kv_transfer/kv_lookup_buffer/base.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T06:54:18.7513015Z #34 3.956 copying vllm/distributed/kv_transfer/kv_lookup_buffer/mooncake_store.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T06:54:18.7514453Z #34 3.956 copying vllm/distributed/kv_transfer/kv_lookup_buffer/simple_buffer.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T06:54:18.7515534Z #34 3.957 creating build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_pipe 2025-09-07T06:54:18.7516507Z #34 3.957 copying vllm/distributed/kv_transfer/kv_pipe/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_pipe 2025-09-07T06:54:18.7517682Z #34 3.957 copying vllm/distributed/kv_transfer/kv_pipe/base.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_pipe 2025-09-07T06:54:18.7518883Z #34 3.957 copying vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_pipe 2025-09-07T06:54:18.7520191Z #34 3.957 copying vllm/distributed/kv_transfer/kv_pipe/pynccl_pipe.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_pipe 2025-09-07T06:54:18.7521224Z #34 3.958 creating build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T06:54:18.7522337Z #34 3.958 copying vllm/distributed/kv_transfer/kv_connector/v1/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T06:54:18.7523671Z #34 3.958 copying vllm/distributed/kv_transfer/kv_connector/v1/base.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T06:54:18.7525058Z #34 3.958 copying vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T06:54:18.7526509Z #34 3.959 copying vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T06:54:18.7527947Z #34 3.959 copying vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T06:54:18.7529415Z #34 3.959 copying vllm/distributed/kv_transfer/kv_connector/v1/shared_storage_connector.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T06:54:18.7530624Z #34 3.959 creating build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T06:54:18.7531797Z #34 3.960 copying vllm/distributed/kv_transfer/kv_connector/v1/p2p/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T06:54:18.7533535Z #34 3.960 copying vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_connector.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T06:54:18.7535112Z #34 3.960 copying vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_engine.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T06:54:18.7536675Z #34 3.960 copying vllm/distributed/kv_transfer/kv_connector/v1/p2p/tensor_memory_pool.py -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T06:54:18.7537805Z #34 3.960 creating build/lib.linux-x86_64-cpython-312/vllm/engine/multiprocessing 2025-09-07T06:54:18.7538706Z #34 3.961 copying vllm/engine/multiprocessing/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/engine/multiprocessing 2025-09-07T06:54:18.7539838Z #34 3.961 copying vllm/engine/multiprocessing/client.py -> build/lib.linux-x86_64-cpython-312/vllm/engine/multiprocessing 2025-09-07T06:54:18.7540948Z #34 3.961 copying vllm/engine/multiprocessing/engine.py -> build/lib.linux-x86_64-cpython-312/vllm/engine/multiprocessing 2025-09-07T06:54:18.7541849Z #34 3.961 creating build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor 2025-09-07T06:54:18.7542731Z #34 3.962 copying vllm/engine/output_processor/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor 2025-09-07T06:54:18.7543835Z #34 3.962 copying vllm/engine/output_processor/interfaces.py -> build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor 2025-09-07T06:54:18.7545058Z #34 3.962 copying vllm/engine/output_processor/single_step.py -> build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor 2025-09-07T06:54:18.7546159Z #34 3.962 copying vllm/engine/output_processor/stop_checker.py -> build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor 2025-09-07T06:54:18.7547234Z #34 3.962 copying vllm/engine/output_processor/util.py -> build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor 2025-09-07T06:54:18.7548117Z #34 3.963 creating build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli 2025-09-07T06:54:18.7548910Z #34 3.963 copying vllm/entrypoints/cli/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli 2025-09-07T06:54:18.7549822Z #34 3.963 copying vllm/entrypoints/cli/collect_env.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli 2025-09-07T06:54:18.7550741Z #34 3.963 copying vllm/entrypoints/cli/main.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli 2025-09-07T06:54:18.7551665Z #34 3.963 copying vllm/entrypoints/cli/openai.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli 2025-09-07T06:54:18.7552566Z #34 3.964 copying vllm/entrypoints/cli/run_batch.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli 2025-09-07T06:54:18.7553481Z #34 3.964 copying vllm/entrypoints/cli/serve.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli 2025-09-07T06:54:18.7554362Z #34 3.964 copying vllm/entrypoints/cli/types.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli 2025-09-07T06:54:18.7555128Z #34 3.964 creating build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7555940Z #34 3.965 copying vllm/entrypoints/openai/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7556922Z #34 3.965 copying vllm/entrypoints/openai/api_server.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7557922Z #34 3.965 copying vllm/entrypoints/openai/cli_args.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7559036Z #34 3.965 copying vllm/entrypoints/openai/logits_processors.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7569944Z #34 3.965 copying vllm/entrypoints/openai/protocol.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7571061Z #34 3.966 copying vllm/entrypoints/openai/run_batch.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7572095Z #34 3.966 copying vllm/entrypoints/openai/serving_chat.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7573504Z #34 3.966 copying vllm/entrypoints/openai/serving_classification.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7574663Z #34 3.966 copying vllm/entrypoints/openai/serving_completion.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7575807Z #34 3.967 copying vllm/entrypoints/openai/serving_embedding.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7576906Z #34 3.967 copying vllm/entrypoints/openai/serving_engine.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7578046Z #34 3.967 copying vllm/entrypoints/openai/serving_models.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7579157Z #34 3.967 copying vllm/entrypoints/openai/serving_pooling.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7580260Z #34 3.968 copying vllm/entrypoints/openai/serving_responses.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7581377Z #34 3.968 copying vllm/entrypoints/openai/serving_score.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7582500Z #34 3.968 copying vllm/entrypoints/openai/serving_tokenization.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7583667Z #34 3.968 copying vllm/entrypoints/openai/serving_transcription.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7584801Z #34 3.969 copying vllm/entrypoints/openai/speech_to_text.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai 2025-09-07T06:54:18.7585776Z #34 3.969 creating build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark 2025-09-07T06:54:18.7586690Z #34 3.969 copying vllm/entrypoints/cli/benchmark/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark 2025-09-07T06:54:18.7587796Z #34 3.969 copying vllm/entrypoints/cli/benchmark/base.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark 2025-09-07T06:54:18.7588939Z #34 3.969 copying vllm/entrypoints/cli/benchmark/latency.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark 2025-09-07T06:54:18.7590053Z #34 3.970 copying vllm/entrypoints/cli/benchmark/main.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark 2025-09-07T06:54:18.7591181Z #34 3.970 copying vllm/entrypoints/cli/benchmark/serve.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark 2025-09-07T06:54:18.7592460Z #34 3.970 copying vllm/entrypoints/cli/benchmark/throughput.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark 2025-09-07T06:54:18.7593645Z #34 3.971 creating build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7594659Z #34 3.971 copying vllm/entrypoints/openai/tool_parsers/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7595995Z #34 3.971 copying vllm/entrypoints/openai/tool_parsers/abstract_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7597394Z #34 3.971 copying vllm/entrypoints/openai/tool_parsers/deepseekv31_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7598823Z #34 3.971 copying vllm/entrypoints/openai/tool_parsers/deepseekv3_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7600222Z #34 3.972 copying vllm/entrypoints/openai/tool_parsers/glm4_moe_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7601682Z #34 3.972 copying vllm/entrypoints/openai/tool_parsers/granite_20b_fc_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7603087Z #34 3.972 copying vllm/entrypoints/openai/tool_parsers/granite_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7604593Z #34 3.972 copying vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7605939Z #34 3.972 copying vllm/entrypoints/openai/tool_parsers/hunyuan_a13b_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7607321Z #34 3.973 copying vllm/entrypoints/openai/tool_parsers/internlm2_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7608709Z #34 3.973 copying vllm/entrypoints/openai/tool_parsers/jamba_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7610025Z #34 3.973 copying vllm/entrypoints/openai/tool_parsers/kimi_k2_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7611404Z #34 3.973 copying vllm/entrypoints/openai/tool_parsers/llama4_pythonic_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7613009Z #34 3.974 copying vllm/entrypoints/openai/tool_parsers/llama_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7614391Z #34 3.974 copying vllm/entrypoints/openai/tool_parsers/minimax_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7615781Z #34 3.974 copying vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7617155Z #34 3.974 copying vllm/entrypoints/openai/tool_parsers/openai_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7618548Z #34 3.974 copying vllm/entrypoints/openai/tool_parsers/phi4mini_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7620011Z #34 3.975 copying vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7621411Z #34 3.975 copying vllm/entrypoints/openai/tool_parsers/qwen3coder_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7622859Z #34 3.975 copying vllm/entrypoints/openai/tool_parsers/seed_oss_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7624230Z #34 3.975 copying vllm/entrypoints/openai/tool_parsers/step3_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7625617Z #34 3.976 copying vllm/entrypoints/openai/tool_parsers/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7626877Z #34 3.976 copying vllm/entrypoints/openai/tool_parsers/xlam_tool_parser.py -> build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers 2025-09-07T06:54:18.7627797Z #34 3.976 creating build/lib.linux-x86_64-cpython-312/vllm/lora/ops 2025-09-07T06:54:18.7628455Z #34 3.976 copying vllm/lora/ops/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops 2025-09-07T06:54:18.7629158Z #34 3.977 creating build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper 2025-09-07T06:54:18.7629958Z #34 3.977 copying vllm/lora/punica_wrapper/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper 2025-09-07T06:54:18.7630951Z #34 3.977 copying vllm/lora/punica_wrapper/punica_base.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper 2025-09-07T06:54:18.7631978Z #34 3.977 copying vllm/lora/punica_wrapper/punica_cpu.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper 2025-09-07T06:54:18.7632984Z #34 3.977 copying vllm/lora/punica_wrapper/punica_gpu.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper 2025-09-07T06:54:18.7634022Z #34 3.978 copying vllm/lora/punica_wrapper/punica_selector.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper 2025-09-07T06:54:18.7635039Z #34 3.978 copying vllm/lora/punica_wrapper/punica_tpu.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper 2025-09-07T06:54:18.7636039Z #34 3.978 copying vllm/lora/punica_wrapper/punica_xpu.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper 2025-09-07T06:54:18.7637010Z #34 3.978 copying vllm/lora/punica_wrapper/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper 2025-09-07T06:54:18.7637806Z #34 3.978 creating build/lib.linux-x86_64-cpython-312/vllm/lora/ops/ipex_ops 2025-09-07T06:54:18.7638622Z #34 3.979 copying vllm/lora/ops/ipex_ops/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/ipex_ops 2025-09-07T06:54:18.7639536Z #34 3.979 copying vllm/lora/ops/ipex_ops/lora_ops.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/ipex_ops 2025-09-07T06:54:18.7640319Z #34 3.979 creating build/lib.linux-x86_64-cpython-312/vllm/lora/ops/torch_ops 2025-09-07T06:54:18.7641100Z #34 3.979 copying vllm/lora/ops/torch_ops/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/torch_ops 2025-09-07T06:54:18.7642047Z #34 3.979 copying vllm/lora/ops/torch_ops/lora_ops.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/torch_ops 2025-09-07T06:54:18.7642848Z #34 3.980 creating build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops 2025-09-07T06:54:18.7643642Z #34 3.980 copying vllm/lora/ops/triton_ops/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops 2025-09-07T06:54:18.7644636Z #34 3.980 copying vllm/lora/ops/triton_ops/kernel_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops 2025-09-07T06:54:18.7645655Z #34 3.980 copying vllm/lora/ops/triton_ops/lora_expand_op.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops 2025-09-07T06:54:18.7646720Z #34 3.981 copying vllm/lora/ops/triton_ops/lora_kernel_metadata.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops 2025-09-07T06:54:18.7647817Z #34 3.981 copying vllm/lora/ops/triton_ops/lora_shrink_op.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops 2025-09-07T06:54:18.7648799Z #34 3.981 copying vllm/lora/ops/triton_ops/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops 2025-09-07T06:54:18.7649621Z #34 3.981 creating build/lib.linux-x86_64-cpython-312/vllm/lora/ops/xla_ops 2025-09-07T06:54:18.7650365Z #34 3.981 copying vllm/lora/ops/xla_ops/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/xla_ops 2025-09-07T06:54:18.7651264Z #34 3.982 copying vllm/lora/ops/xla_ops/lora_ops.py -> build/lib.linux-x86_64-cpython-312/vllm/lora/ops/xla_ops 2025-09-07T06:54:18.7652050Z #34 3.982 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T06:54:18.7653131Z #34 3.982 copying vllm/model_executor/layers/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T06:54:18.7654194Z #34 3.982 copying vllm/model_executor/layers/activation.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T06:54:18.7655315Z #34 3.982 copying vllm/model_executor/layers/attention_layer_base.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T06:54:18.7656441Z #34 3.983 copying vllm/model_executor/layers/layernorm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T06:54:18.7657539Z #34 3.983 copying vllm/model_executor/layers/lightning_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T06:54:18.7658601Z #34 3.983 copying vllm/model_executor/layers/linear.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T06:54:18.7659745Z #34 3.983 copying vllm/model_executor/layers/logits_processor.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T06:54:18.7660816Z #34 3.984 copying vllm/model_executor/layers/mla.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T06:54:18.7661819Z #34 3.984 copying vllm/model_executor/layers/pooler.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T06:54:18.7662870Z #34 3.984 copying vllm/model_executor/layers/resampler.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T06:54:18.7663917Z #34 3.984 copying vllm/model_executor/layers/sampler.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T06:54:18.7665050Z #34 3.984 copying vllm/model_executor/layers/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T06:54:18.7666169Z #34 3.985 copying vllm/model_executor/layers/vocab_parallel_embedding.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers 2025-09-07T06:54:18.7667105Z #34 3.985 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T06:54:18.7668010Z #34 3.985 copying vllm/model_executor/model_loader/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T06:54:18.7669117Z #34 3.985 copying vllm/model_executor/model_loader/base_loader.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T06:54:18.7670306Z #34 3.986 copying vllm/model_executor/model_loader/bitsandbytes_loader.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T06:54:18.7671522Z #34 3.986 copying vllm/model_executor/model_loader/default_loader.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T06:54:18.7672672Z #34 3.986 copying vllm/model_executor/model_loader/dummy_loader.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T06:54:18.7673823Z #34 3.986 copying vllm/model_executor/model_loader/gguf_loader.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T06:54:18.7675021Z #34 3.987 copying vllm/model_executor/model_loader/runai_streamer_loader.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T06:54:18.7676275Z #34 3.987 copying vllm/model_executor/model_loader/sharded_state_loader.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T06:54:18.7677463Z #34 3.987 copying vllm/model_executor/model_loader/tensorizer.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T06:54:18.7678669Z #34 3.987 copying vllm/model_executor/model_loader/tensorizer_loader.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T06:54:18.7679812Z #34 3.988 copying vllm/model_executor/model_loader/tpu.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T06:54:18.7680902Z #34 3.988 copying vllm/model_executor/model_loader/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T06:54:18.7682011Z #34 3.988 copying vllm/model_executor/model_loader/weight_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader 2025-09-07T06:54:18.7682925Z #34 3.989 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7683750Z #34 3.989 copying vllm/model_executor/models/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7684749Z #34 3.990 copying vllm/model_executor/models/adapters.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7685765Z #34 3.990 copying vllm/model_executor/models/aimv2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7686750Z #34 3.990 copying vllm/model_executor/models/apertus.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7687778Z #34 3.990 copying vllm/model_executor/models/arcee.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7688779Z #34 3.991 copying vllm/model_executor/models/arctic.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7689757Z #34 3.991 copying vllm/model_executor/models/aria.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7690757Z #34 3.991 copying vllm/model_executor/models/aya_vision.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7691776Z #34 3.991 copying vllm/model_executor/models/baichuan.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7693208Z #34 3.991 copying vllm/model_executor/models/bailing_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7694260Z #34 3.992 copying vllm/model_executor/models/bamba.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7695328Z #34 3.992 copying vllm/model_executor/models/bart.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7696340Z #34 3.992 copying vllm/model_executor/models/bert.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7697381Z #34 3.992 copying vllm/model_executor/models/bert_with_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7698443Z #34 3.993 copying vllm/model_executor/models/blip.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7699459Z #34 3.993 copying vllm/model_executor/models/blip2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7700457Z #34 3.993 copying vllm/model_executor/models/bloom.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7701500Z #34 3.993 copying vllm/model_executor/models/chameleon.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7702566Z #34 3.994 copying vllm/model_executor/models/chatglm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7703581Z #34 3.994 copying vllm/model_executor/models/clip.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7704690Z #34 3.994 copying vllm/model_executor/models/cohere2_vision.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7705870Z #34 3.994 copying vllm/model_executor/models/commandr.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7706927Z #34 3.994 copying vllm/model_executor/models/config.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7707989Z #34 3.995 copying vllm/model_executor/models/constant_size_cache.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7709029Z #34 3.995 copying vllm/model_executor/models/dbrx.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7710029Z #34 3.995 copying vllm/model_executor/models/deepseek.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7711067Z #34 3.995 copying vllm/model_executor/models/deepseek_eagle.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7712139Z #34 3.996 copying vllm/model_executor/models/deepseek_mtp.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7713186Z #34 3.996 copying vllm/model_executor/models/deepseek_v2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7714222Z #34 3.996 copying vllm/model_executor/models/deepseek_vl2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7715243Z #34 3.996 copying vllm/model_executor/models/donut.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7716255Z #34 3.996 copying vllm/model_executor/models/dots1.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7717258Z #34 3.997 copying vllm/model_executor/models/ernie45.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7718292Z #34 3.997 copying vllm/model_executor/models/ernie45_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7719315Z #34 3.997 copying vllm/model_executor/models/ernie45_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7720357Z #34 3.997 copying vllm/model_executor/models/ernie45_vl_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7721402Z #34 3.998 copying vllm/model_executor/models/ernie_mtp.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7722402Z #34 3.998 copying vllm/model_executor/models/exaone.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7723447Z #34 3.998 copying vllm/model_executor/models/exaone4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7724481Z #34 3.998 copying vllm/model_executor/models/fairseq2_llama.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7725522Z #34 3.999 copying vllm/model_executor/models/falcon.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7726528Z #34 3.999 copying vllm/model_executor/models/falcon_h1.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7727543Z #34 3.999 copying vllm/model_executor/models/florence2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7728550Z #34 3.999 copying vllm/model_executor/models/fuyu.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7729525Z #34 3.999 copying vllm/model_executor/models/gemma.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7730519Z #34 4.000 copying vllm/model_executor/models/gemma2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7731518Z #34 4.000 copying vllm/model_executor/models/gemma3.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7732625Z #34 4.000 copying vllm/model_executor/models/gemma3_mm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7733844Z #34 4.000 copying vllm/model_executor/models/gemma3n.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7734924Z #34 4.001 copying vllm/model_executor/models/gemma3n_mm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7735954Z #34 4.001 copying vllm/model_executor/models/glm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7736964Z #34 4.001 copying vllm/model_executor/models/glm4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7737963Z #34 4.001 copying vllm/model_executor/models/glm4_1v.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7738990Z #34 4.001 copying vllm/model_executor/models/glm4_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7740046Z #34 4.002 copying vllm/model_executor/models/glm4_moe_mtp.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7741081Z #34 4.002 copying vllm/model_executor/models/glm4v.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7742092Z #34 4.002 copying vllm/model_executor/models/gpt2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7743119Z #34 4.002 copying vllm/model_executor/models/gpt_bigcode.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7744188Z #34 4.003 copying vllm/model_executor/models/gpt_j.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7745312Z #34 4.003 copying vllm/model_executor/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7746297Z #34 4.003 copying vllm/model_executor/models/gpt_oss.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7747310Z #34 4.003 copying vllm/model_executor/models/granite.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7748342Z #34 4.003 copying vllm/model_executor/models/granite_speech.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7749415Z #34 4.004 copying vllm/model_executor/models/granitemoe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7750514Z #34 4.004 copying vllm/model_executor/models/granitemoehybrid.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7751658Z #34 4.004 copying vllm/model_executor/models/granitemoeshared.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7752726Z #34 4.004 copying vllm/model_executor/models/gritlm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7753707Z #34 4.005 copying vllm/model_executor/models/grok1.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7754688Z #34 4.005 copying vllm/model_executor/models/h2ovl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7755695Z #34 4.005 copying vllm/model_executor/models/hunyuan_v1.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7756781Z #34 4.005 copying vllm/model_executor/models/hyperclovax_vision.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7757911Z #34 4.006 copying vllm/model_executor/models/idefics2_vision_model.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7759000Z #34 4.006 copying vllm/model_executor/models/idefics3.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7760022Z #34 4.006 copying vllm/model_executor/models/interfaces.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7761119Z #34 4.006 copying vllm/model_executor/models/interfaces_base.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7762182Z #34 4.007 copying vllm/model_executor/models/intern_vit.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7763248Z #34 4.007 copying vllm/model_executor/models/internlm2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7764290Z #34 4.007 copying vllm/model_executor/models/internlm2_ve.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7765333Z #34 4.007 copying vllm/model_executor/models/interns1.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7766356Z #34 4.008 copying vllm/model_executor/models/interns1_vit.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7767395Z #34 4.008 copying vllm/model_executor/models/internvl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7768386Z #34 4.008 copying vllm/model_executor/models/jais.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7769372Z #34 4.009 copying vllm/model_executor/models/jamba.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7770344Z #34 4.009 copying vllm/model_executor/models/jina_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7771321Z #34 4.009 copying vllm/model_executor/models/keye.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7772329Z #34 4.009 copying vllm/model_executor/models/keye_vl1_5.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7773607Z #34 4.010 copying vllm/model_executor/models/kimi_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7774617Z #34 4.010 copying vllm/model_executor/models/lfm2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7775603Z #34 4.010 copying vllm/model_executor/models/llama.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7776619Z #34 4.010 copying vllm/model_executor/models/llama4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7777680Z #34 4.011 copying vllm/model_executor/models/llama4_eagle.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7778737Z #34 4.011 copying vllm/model_executor/models/llama_eagle.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7779848Z #34 4.011 copying vllm/model_executor/models/llama_eagle3.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7780877Z #34 4.012 copying vllm/model_executor/models/llava.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7781915Z #34 4.012 copying vllm/model_executor/models/llava_next.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7782996Z #34 4.012 copying vllm/model_executor/models/llava_next_video.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7784098Z #34 4.012 copying vllm/model_executor/models/llava_onevision.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7785257Z #34 4.013 copying vllm/model_executor/models/mamba.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7786228Z #34 4.013 copying vllm/model_executor/models/mamba2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7787248Z #34 4.013 copying vllm/model_executor/models/mamba_cache.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7788256Z #34 4.013 copying vllm/model_executor/models/medusa.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7789299Z #34 4.014 copying vllm/model_executor/models/midashenglm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7790308Z #34 4.014 copying vllm/model_executor/models/mimo.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7791306Z #34 4.014 copying vllm/model_executor/models/mimo_mtp.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7792430Z #34 4.014 copying vllm/model_executor/models/minicpm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7793647Z #34 4.015 copying vllm/model_executor/models/minicpm3.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7794711Z #34 4.015 copying vllm/model_executor/models/minicpm_eagle.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7795783Z #34 4.015 copying vllm/model_executor/models/minicpmo.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7796828Z #34 4.015 copying vllm/model_executor/models/minicpmv.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7797890Z #34 4.016 copying vllm/model_executor/models/minimax_cache.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7798983Z #34 4.016 copying vllm/model_executor/models/minimax_text_01.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7800055Z #34 4.016 copying vllm/model_executor/models/minimax_vl_01.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7801176Z #34 4.017 copying vllm/model_executor/models/mistral3.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7802209Z #34 4.017 copying vllm/model_executor/models/mixtral.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7803264Z #34 4.017 copying vllm/model_executor/models/mixtral_quant.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7804320Z #34 4.017 copying vllm/model_executor/models/mllama.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7805437Z #34 4.018 copying vllm/model_executor/models/mllama4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7806476Z #34 4.018 copying vllm/model_executor/models/mlp_speculator.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7807533Z #34 4.018 copying vllm/model_executor/models/modernbert.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7808624Z #34 4.018 copying vllm/model_executor/models/module_mapping.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7809655Z #34 4.019 copying vllm/model_executor/models/molmo.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7810647Z #34 4.019 copying vllm/model_executor/models/moonvit.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7811618Z #34 4.019 copying vllm/model_executor/models/mpt.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7812667Z #34 4.020 copying vllm/model_executor/models/nemotron.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7813885Z #34 4.020 copying vllm/model_executor/models/nemotron_h.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7814955Z #34 4.020 copying vllm/model_executor/models/nemotron_nas.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7816026Z #34 4.020 copying vllm/model_executor/models/nemotron_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7817049Z #34 4.021 copying vllm/model_executor/models/nvlm_d.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7818101Z #34 4.021 copying vllm/model_executor/models/olmo.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7819094Z #34 4.021 copying vllm/model_executor/models/olmo2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7820136Z #34 4.021 copying vllm/model_executor/models/olmoe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7821130Z #34 4.022 copying vllm/model_executor/models/opt.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7822119Z #34 4.022 copying vllm/model_executor/models/orion.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7823114Z #34 4.022 copying vllm/model_executor/models/ovis.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7824109Z #34 4.022 copying vllm/model_executor/models/ovis2_5.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7825255Z #34 4.023 copying vllm/model_executor/models/paligemma.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7826280Z #34 4.023 copying vllm/model_executor/models/persimmon.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7827265Z #34 4.023 copying vllm/model_executor/models/phi.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7828223Z #34 4.024 copying vllm/model_executor/models/phi3.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7829212Z #34 4.024 copying vllm/model_executor/models/phi3v.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7830246Z #34 4.024 copying vllm/model_executor/models/phi4_multimodal.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7831303Z #34 4.024 copying vllm/model_executor/models/phi4flash.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7832298Z #34 4.025 copying vllm/model_executor/models/phi4mm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7833314Z #34 4.025 copying vllm/model_executor/models/phi4mm_audio.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7834365Z #34 4.025 copying vllm/model_executor/models/phi4mm_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7835372Z #34 4.025 copying vllm/model_executor/models/phimoe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7836404Z #34 4.026 copying vllm/model_executor/models/pixtral.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7837395Z #34 4.026 copying vllm/model_executor/models/plamo2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.7838372Z #34 4.026 copying vllm/model_executor/models/qwen.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8383453Z #34 4.026 copying vllm/model_executor/models/qwen2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8384595Z #34 4.027 copying vllm/model_executor/models/qwen2_5_omni_thinker.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8385721Z #34 4.027 copying vllm/model_executor/models/qwen2_5_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8386911Z #34 4.027 copying vllm/model_executor/models/qwen2_audio.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8387938Z #34 4.028 copying vllm/model_executor/models/qwen2_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8388950Z #34 4.028 copying vllm/model_executor/models/qwen2_rm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8390103Z #34 4.028 copying vllm/model_executor/models/qwen2_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8391091Z #34 4.029 copying vllm/model_executor/models/qwen3.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8392314Z #34 4.029 copying vllm/model_executor/models/qwen3_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8393501Z #34 4.029 copying vllm/model_executor/models/qwen_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8394544Z #34 4.029 copying vllm/model_executor/models/registry.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8395593Z #34 4.030 copying vllm/model_executor/models/roberta.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8396596Z #34 4.030 copying vllm/model_executor/models/rvl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8397612Z #34 4.030 copying vllm/model_executor/models/seed_oss.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8398634Z #34 4.030 copying vllm/model_executor/models/siglip.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8399714Z #34 4.031 copying vllm/model_executor/models/siglip2navit.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8400810Z #34 4.031 copying vllm/model_executor/models/skyworkr1v.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8401917Z #34 4.031 copying vllm/model_executor/models/smolvlm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8402936Z #34 4.031 copying vllm/model_executor/models/solar.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8404074Z #34 4.032 copying vllm/model_executor/models/stablelm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8405089Z #34 4.032 copying vllm/model_executor/models/starcoder2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8406112Z #34 4.032 copying vllm/model_executor/models/step3_text.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8407103Z #34 4.032 copying vllm/model_executor/models/step3_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8408076Z #34 4.033 copying vllm/model_executor/models/swin.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8409094Z #34 4.033 copying vllm/model_executor/models/tarsier.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8410099Z #34 4.033 copying vllm/model_executor/models/telechat2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8411108Z #34 4.034 copying vllm/model_executor/models/teleflm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8412113Z #34 4.034 copying vllm/model_executor/models/terratorch.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8413444Z #34 4.034 copying vllm/model_executor/models/transformers.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8414512Z #34 4.034 copying vllm/model_executor/models/ultravox.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8415523Z #34 4.035 copying vllm/model_executor/models/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8416538Z #34 4.035 copying vllm/model_executor/models/vision.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8417553Z #34 4.035 copying vllm/model_executor/models/voxtral.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8418643Z #34 4.035 copying vllm/model_executor/models/whisper.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8419672Z #34 4.036 copying vllm/model_executor/models/zamba2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/models 2025-09-07T06:54:18.8420548Z #34 4.036 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/warmup 2025-09-07T06:54:18.8421396Z #34 4.036 copying vllm/model_executor/warmup/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/warmup 2025-09-07T06:54:18.8422461Z #34 4.036 copying vllm/model_executor/warmup/deep_gemm_warmup.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/warmup 2025-09-07T06:54:18.8423562Z #34 4.037 copying vllm/model_executor/warmup/kernel_warmup.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/warmup 2025-09-07T06:54:18.8424595Z #34 4.037 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8425553Z #34 4.037 copying vllm/model_executor/layers/fused_moe/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8426798Z #34 4.038 copying vllm/model_executor/layers/fused_moe/batched_deep_gemm_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8428151Z #34 4.038 copying vllm/model_executor/layers/fused_moe/batched_triton_or_deep_gemm_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8429431Z #34 4.038 copying vllm/model_executor/layers/fused_moe/config.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8430664Z #34 4.039 copying vllm/model_executor/layers/fused_moe/cpu_fused_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8431879Z #34 4.039 copying vllm/model_executor/layers/fused_moe/cutlass_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8433099Z #34 4.039 copying vllm/model_executor/layers/fused_moe/deep_gemm_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8434338Z #34 4.039 copying vllm/model_executor/layers/fused_moe/deep_gemm_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8435636Z #34 4.040 copying vllm/model_executor/layers/fused_moe/deepep_ht_prepare_finalize.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8436994Z #34 4.040 copying vllm/model_executor/layers/fused_moe/deepep_ll_prepare_finalize.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8438368Z #34 4.040 copying vllm/model_executor/layers/fused_moe/flashinfer_cutlass_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8439766Z #34 4.040 copying vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8441132Z #34 4.041 copying vllm/model_executor/layers/fused_moe/fused_batched_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8442394Z #34 4.041 copying vllm/model_executor/layers/fused_moe/fused_marlin_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8443608Z #34 4.041 copying vllm/model_executor/layers/fused_moe/fused_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8444881Z #34 4.042 copying vllm/model_executor/layers/fused_moe/gpt_oss_triton_kernels_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8446129Z #34 4.042 copying vllm/model_executor/layers/fused_moe/layer.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8447366Z #34 4.042 copying vllm/model_executor/layers/fused_moe/modular_kernel.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8448644Z #34 4.042 copying vllm/model_executor/layers/fused_moe/moe_align_block_size.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8449906Z #34 4.043 copying vllm/model_executor/layers/fused_moe/moe_pallas.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8451166Z #34 4.043 copying vllm/model_executor/layers/fused_moe/moe_permute_unpermute.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8452543Z #34 4.043 copying vllm/model_executor/layers/fused_moe/moe_torch_iterative.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8454045Z #34 4.043 copying vllm/model_executor/layers/fused_moe/pplx_prepare_finalize.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8455387Z #34 4.044 copying vllm/model_executor/layers/fused_moe/prepare_finalize.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8456709Z #34 4.044 copying vllm/model_executor/layers/fused_moe/rocm_aiter_fused_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8458032Z #34 4.044 copying vllm/model_executor/layers/fused_moe/routing_simulator.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8459379Z #34 4.044 copying vllm/model_executor/layers/fused_moe/topk_weight_and_reduce.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8460759Z #34 4.045 copying vllm/model_executor/layers/fused_moe/triton_deep_gemm_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8462052Z #34 4.045 copying vllm/model_executor/layers/fused_moe/trtllm_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8463278Z #34 4.045 copying vllm/model_executor/layers/fused_moe/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe 2025-09-07T06:54:18.8464229Z #34 4.046 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba 2025-09-07T06:54:18.8465441Z #34 4.046 copying vllm/model_executor/layers/mamba/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba 2025-09-07T06:54:18.8466592Z #34 4.046 copying vllm/model_executor/layers/mamba/abstract.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba 2025-09-07T06:54:18.8467729Z #34 4.046 copying vllm/model_executor/layers/mamba/linear_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba 2025-09-07T06:54:18.8468869Z #34 4.047 copying vllm/model_executor/layers/mamba/mamba2_metadata.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba 2025-09-07T06:54:18.8470003Z #34 4.047 copying vllm/model_executor/layers/mamba/mamba_mixer.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba 2025-09-07T06:54:18.8471119Z #34 4.047 copying vllm/model_executor/layers/mamba/mamba_mixer2.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba 2025-09-07T06:54:18.8472245Z #34 4.047 copying vllm/model_executor/layers/mamba/mamba_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba 2025-09-07T06:54:18.8473350Z #34 4.048 copying vllm/model_executor/layers/mamba/short_conv.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba 2025-09-07T06:54:18.8474278Z #34 4.048 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8475256Z #34 4.048 copying vllm/model_executor/layers/quantization/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8476515Z #34 4.048 copying vllm/model_executor/layers/quantization/auto_round.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8477737Z #34 4.049 copying vllm/model_executor/layers/quantization/awq.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8478994Z #34 4.049 copying vllm/model_executor/layers/quantization/awq_marlin.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8480246Z #34 4.049 copying vllm/model_executor/layers/quantization/awq_triton.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8481505Z #34 4.050 copying vllm/model_executor/layers/quantization/base_config.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8482743Z #34 4.050 copying vllm/model_executor/layers/quantization/bitblas.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8484013Z #34 4.050 copying vllm/model_executor/layers/quantization/bitsandbytes.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8485292Z #34 4.050 copying vllm/model_executor/layers/quantization/deepgemm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8486560Z #34 4.051 copying vllm/model_executor/layers/quantization/deepspeedfp.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8487840Z #34 4.051 copying vllm/model_executor/layers/quantization/experts_int8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8489142Z #34 4.051 copying vllm/model_executor/layers/quantization/fbgemm_fp8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8490361Z #34 4.051 copying vllm/model_executor/layers/quantization/fp8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8491560Z #34 4.052 copying vllm/model_executor/layers/quantization/gguf.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8493341Z #34 4.052 copying vllm/model_executor/layers/quantization/gptq.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8494660Z #34 4.052 copying vllm/model_executor/layers/quantization/gptq_bitblas.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8496012Z #34 4.053 copying vllm/model_executor/layers/quantization/gptq_marlin.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8497413Z #34 4.053 copying vllm/model_executor/layers/quantization/gptq_marlin_24.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8498758Z #34 4.053 copying vllm/model_executor/layers/quantization/hqq_marlin.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8500056Z #34 4.053 copying vllm/model_executor/layers/quantization/inc.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8501373Z #34 4.054 copying vllm/model_executor/layers/quantization/input_quant_fp8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8502748Z #34 4.054 copying vllm/model_executor/layers/quantization/ipex_quant.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8504085Z #34 4.054 copying vllm/model_executor/layers/quantization/kv_cache.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8505476Z #34 4.054 copying vllm/model_executor/layers/quantization/modelopt.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8506670Z #34 4.055 copying vllm/model_executor/layers/quantization/moe_wna16.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8507887Z #34 4.055 copying vllm/model_executor/layers/quantization/mxfp4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8509076Z #34 4.055 copying vllm/model_executor/layers/quantization/petit.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8510660Z #34 4.055 copying vllm/model_executor/layers/quantization/ptpc_fp8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8511957Z #34 4.056 copying vllm/model_executor/layers/quantization/rtn.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8513214Z #34 4.056 copying vllm/model_executor/layers/quantization/schema.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8514510Z #34 4.056 copying vllm/model_executor/layers/quantization/torchao.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8515790Z #34 4.056 copying vllm/model_executor/layers/quantization/tpu_int8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization 2025-09-07T06:54:18.8516841Z #34 4.057 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T06:54:18.8517917Z #34 4.057 copying vllm/model_executor/layers/rotary_embedding/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T06:54:18.8519264Z #34 4.057 copying vllm/model_executor/layers/rotary_embedding/base.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T06:54:18.8520594Z #34 4.058 copying vllm/model_executor/layers/rotary_embedding/common.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T06:54:18.8522003Z #34 4.058 copying vllm/model_executor/layers/rotary_embedding/deepseek_scaling_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T06:54:18.8523431Z #34 4.058 copying vllm/model_executor/layers/rotary_embedding/dual_chunk_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T06:54:18.8524878Z #34 4.058 copying vllm/model_executor/layers/rotary_embedding/dynamic_ntk_alpha_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T06:54:18.8526480Z #34 4.059 copying vllm/model_executor/layers/rotary_embedding/dynamic_ntk_scaling_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T06:54:18.8527892Z #34 4.059 copying vllm/model_executor/layers/rotary_embedding/ernie45_vl_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T06:54:18.8529281Z #34 4.059 copying vllm/model_executor/layers/rotary_embedding/linear_scaling_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T06:54:18.8530646Z #34 4.059 copying vllm/model_executor/layers/rotary_embedding/llama3_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T06:54:18.8532018Z #34 4.060 copying vllm/model_executor/layers/rotary_embedding/llama4_vision_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T06:54:18.8533827Z #34 4.060 copying vllm/model_executor/layers/rotary_embedding/mrope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T06:54:18.8535233Z #34 4.060 copying vllm/model_executor/layers/rotary_embedding/ntk_scaling_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T06:54:18.8536741Z #34 4.060 copying vllm/model_executor/layers/rotary_embedding/phi3_long_rope_scaled_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T06:54:18.8538293Z #34 4.061 copying vllm/model_executor/layers/rotary_embedding/yarn_scaling_rope.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding 2025-09-07T06:54:18.8539430Z #34 4.061 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/shared_fused_moe 2025-09-07T06:54:18.8540559Z #34 4.061 copying vllm/model_executor/layers/shared_fused_moe/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/shared_fused_moe 2025-09-07T06:54:18.8541963Z #34 4.061 copying vllm/model_executor/layers/shared_fused_moe/shared_fused_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/shared_fused_moe 2025-09-07T06:54:18.8543059Z #34 4.062 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T06:54:18.8544073Z #34 4.062 copying vllm/model_executor/layers/mamba/ops/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T06:54:18.8545451Z #34 4.063 copying vllm/model_executor/layers/mamba/ops/causal_conv1d.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T06:54:18.8546741Z #34 4.063 copying vllm/model_executor/layers/mamba/ops/layernorm_gated.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T06:54:18.8548014Z #34 4.063 copying vllm/model_executor/layers/mamba/ops/mamba_ssm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T06:54:18.8549221Z #34 4.063 copying vllm/model_executor/layers/mamba/ops/ssd_bmm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T06:54:18.8550486Z #34 4.064 copying vllm/model_executor/layers/mamba/ops/ssd_chunk_scan.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T06:54:18.8551756Z #34 4.064 copying vllm/model_executor/layers/mamba/ops/ssd_chunk_state.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T06:54:18.8553017Z #34 4.064 copying vllm/model_executor/layers/mamba/ops/ssd_combined.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T06:54:18.8554298Z #34 4.064 copying vllm/model_executor/layers/mamba/ops/ssd_state_passing.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops 2025-09-07T06:54:18.8555422Z #34 4.065 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T06:54:18.8556765Z #34 4.065 copying vllm/model_executor/layers/quantization/compressed_tensors/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T06:54:18.8558508Z #34 4.065 copying vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T06:54:18.8560323Z #34 4.065 copying vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T06:54:18.8562116Z #34 4.066 copying vllm/model_executor/layers/quantization/compressed_tensors/triton_scaled_mm.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T06:54:18.8563829Z #34 4.066 copying vllm/model_executor/layers/quantization/compressed_tensors/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T06:54:18.8565070Z #34 4.066 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels 2025-09-07T06:54:18.8566216Z #34 4.067 copying vllm/model_executor/layers/quantization/kernels/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels 2025-09-07T06:54:18.8567345Z #34 4.067 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark 2025-09-07T06:54:18.8568497Z #34 4.067 copying vllm/model_executor/layers/quantization/quark/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark 2025-09-07T06:54:18.8569900Z #34 4.067 copying vllm/model_executor/layers/quantization/quark/quark.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark 2025-09-07T06:54:18.8571335Z #34 4.068 copying vllm/model_executor/layers/quantization/quark/quark_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark 2025-09-07T06:54:18.8573021Z #34 4.068 copying vllm/model_executor/layers/quantization/quark/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark 2025-09-07T06:54:18.8574181Z #34 4.068 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8575325Z #34 4.069 copying vllm/model_executor/layers/quantization/utils/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8576827Z #34 4.069 copying vllm/model_executor/layers/quantization/utils/allspark_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8578372Z #34 4.069 copying vllm/model_executor/layers/quantization/utils/bitblas_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8579912Z #34 4.069 copying vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8581523Z #34 4.070 copying vllm/model_executor/layers/quantization/utils/flashinfer_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8583038Z #34 4.070 copying vllm/model_executor/layers/quantization/utils/fp8_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8584510Z #34 4.070 copying vllm/model_executor/layers/quantization/utils/gptq_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8586076Z #34 4.071 copying vllm/model_executor/layers/quantization/utils/int8_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8587515Z #34 4.071 copying vllm/model_executor/layers/quantization/utils/layer_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8589007Z #34 4.071 copying vllm/model_executor/layers/quantization/utils/machete_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8590480Z #34 4.071 copying vllm/model_executor/layers/quantization/utils/marlin_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8592089Z #34 4.072 copying vllm/model_executor/layers/quantization/utils/marlin_utils_fp4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8593799Z #34 4.072 copying vllm/model_executor/layers/quantization/utils/marlin_utils_fp8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8595355Z #34 4.072 copying vllm/model_executor/layers/quantization/utils/marlin_utils_test.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8596917Z #34 4.072 copying vllm/model_executor/layers/quantization/utils/marlin_utils_test_24.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8598456Z #34 4.073 copying vllm/model_executor/layers/quantization/utils/mxfp4_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8600013Z #34 4.073 copying vllm/model_executor/layers/quantization/utils/mxfp8_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8601547Z #34 4.073 copying vllm/model_executor/layers/quantization/utils/nvfp4_emulation_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8603166Z #34 4.073 copying vllm/model_executor/layers/quantization/utils/nvfp4_moe_support.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8604810Z #34 4.074 copying vllm/model_executor/layers/quantization/utils/petit_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8606258Z #34 4.074 copying vllm/model_executor/layers/quantization/utils/quant_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8607699Z #34 4.074 copying vllm/model_executor/layers/quantization/utils/w8a8_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils 2025-09-07T06:54:18.8608931Z #34 4.075 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T06:54:18.8610370Z #34 4.075 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T06:54:18.8612283Z #34 4.075 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_24.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T06:54:18.8614630Z #34 4.075 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_scheme.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T06:54:18.8616739Z #34 4.076 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_24.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T06:54:18.8618864Z #34 4.076 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_nvfp4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T06:54:18.8621001Z #34 4.076 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a4_nvfp4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T06:54:18.8623147Z #34 4.076 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a8_fp8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T06:54:18.8625321Z #34 4.077 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a8_int.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T06:54:18.8627359Z #34 4.077 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a16_fp8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T06:54:18.8629382Z #34 4.077 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_fp8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T06:54:18.8631420Z #34 4.077 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T06:54:18.8633450Z #34 4.078 copying vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_wNa16.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T06:54:18.8635009Z #34 4.078 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform 2025-09-07T06:54:18.8636520Z #34 4.078 copying vllm/model_executor/layers/quantization/compressed_tensors/transform/linear.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform 2025-09-07T06:54:18.8638408Z #34 4.078 copying vllm/model_executor/layers/quantization/compressed_tensors/transform/module.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform 2025-09-07T06:54:18.8640282Z #34 4.079 copying vllm/model_executor/layers/quantization/compressed_tensors/transform/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform 2025-09-07T06:54:18.8641802Z #34 4.079 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform/schemes 2025-09-07T06:54:18.8643474Z #34 4.079 copying vllm/model_executor/layers/quantization/compressed_tensors/transform/schemes/linear_qutlass_nvfp4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform/schemes 2025-09-07T06:54:18.8645086Z #34 4.080 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T06:54:18.8646580Z #34 4.080 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/MPLinearKernel.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T06:54:18.8648407Z #34 4.080 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T06:54:18.8650183Z #34 4.080 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/allspark.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T06:54:18.8651973Z #34 4.081 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/bitblas.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T06:54:18.8654037Z #34 4.081 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/conch.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T06:54:18.8655894Z #34 4.081 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/cutlass.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T06:54:18.8657759Z #34 4.081 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/dynamic_4bit.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T06:54:18.8659620Z #34 4.082 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/exllama.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T06:54:18.8661446Z #34 4.082 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/machete.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T06:54:18.8663281Z #34 4.082 copying vllm/model_executor/layers/quantization/kernels/mixed_precision/marlin.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T06:54:18.8664794Z #34 4.083 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T06:54:18.8666172Z #34 4.083 copying vllm/model_executor/layers/quantization/kernels/scaled_mm/ScaledMMLinearKernel.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T06:54:18.8667935Z #34 4.083 copying vllm/model_executor/layers/quantization/kernels/scaled_mm/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T06:54:18.8669596Z #34 4.083 copying vllm/model_executor/layers/quantization/kernels/scaled_mm/aiter.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T06:54:18.8671207Z #34 4.084 copying vllm/model_executor/layers/quantization/kernels/scaled_mm/cpu.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T06:54:18.8672834Z #34 4.084 copying vllm/model_executor/layers/quantization/kernels/scaled_mm/cutlass.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T06:54:18.8674798Z #34 4.084 copying vllm/model_executor/layers/quantization/kernels/scaled_mm/triton.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T06:54:18.8676383Z #34 4.084 copying vllm/model_executor/layers/quantization/kernels/scaled_mm/xla.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T06:54:18.8677601Z #34 4.085 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T06:54:18.8678781Z #34 4.085 copying vllm/model_executor/layers/quantization/quark/schemes/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T06:54:18.8680367Z #34 4.085 copying vllm/model_executor/layers/quantization/quark/schemes/quark_scheme.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T06:54:18.8681969Z #34 4.085 copying vllm/model_executor/layers/quantization/quark/schemes/quark_w4a4_mxfp4.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T06:54:18.8683559Z #34 4.086 copying vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_fp8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T06:54:18.8685157Z #34 4.086 copying vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_int8.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T06:54:18.8686375Z #34 4.086 creating build/lib.linux-x86_64-cpython-312/vllm/plugins/io_processors 2025-09-07T06:54:18.8687157Z #34 4.087 copying vllm/plugins/io_processors/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/plugins/io_processors 2025-09-07T06:54:18.8688092Z #34 4.087 copying vllm/plugins/io_processors/interface.py -> build/lib.linux-x86_64-cpython-312/vllm/plugins/io_processors 2025-09-07T06:54:18.8688859Z #34 4.087 creating build/lib.linux-x86_64-cpython-312/vllm/plugins/lora_resolvers 2025-09-07T06:54:18.8689637Z #34 4.088 copying vllm/plugins/lora_resolvers/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/plugins/lora_resolvers 2025-09-07T06:54:18.8690639Z #34 4.088 copying vllm/plugins/lora_resolvers/filesystem_resolver.py -> build/lib.linux-x86_64-cpython-312/vllm/plugins/lora_resolvers 2025-09-07T06:54:18.8691509Z #34 4.088 creating build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates 2025-09-07T06:54:18.8692989Z #34 4.088 copying vllm/transformers_utils/chat_templates/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates 2025-09-07T06:54:18.8694279Z #34 4.089 copying vllm/transformers_utils/chat_templates/registry.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates 2025-09-07T06:54:18.8695272Z #34 4.089 creating build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8696198Z #34 4.089 copying vllm/transformers_utils/configs/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8697386Z #34 4.089 copying vllm/transformers_utils/configs/arctic.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8698529Z #34 4.090 copying vllm/transformers_utils/configs/chatglm.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8699741Z #34 4.090 copying vllm/transformers_utils/configs/deepseek_vl2.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8700888Z #34 4.090 copying vllm/transformers_utils/configs/eagle.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8702019Z #34 4.090 copying vllm/transformers_utils/configs/falcon.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8703134Z #34 4.091 copying vllm/transformers_utils/configs/jais.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8704256Z #34 4.091 copying vllm/transformers_utils/configs/kimi_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8705559Z #34 4.091 copying vllm/transformers_utils/configs/medusa.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8706591Z #34 4.091 copying vllm/transformers_utils/configs/midashenglm.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8707638Z #34 4.092 copying vllm/transformers_utils/configs/mistral.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8708726Z #34 4.092 copying vllm/transformers_utils/configs/mlp_speculator.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8709763Z #34 4.092 copying vllm/transformers_utils/configs/moonvit.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8710788Z #34 4.092 copying vllm/transformers_utils/configs/nemotron.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8711814Z #34 4.093 copying vllm/transformers_utils/configs/nemotron_h.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8712861Z #34 4.093 copying vllm/transformers_utils/configs/nemotron_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8713880Z #34 4.093 copying vllm/transformers_utils/configs/ovis.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8714868Z #34 4.093 copying vllm/transformers_utils/configs/step3_vl.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8715929Z #34 4.094 copying vllm/transformers_utils/configs/ultravox.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs 2025-09-07T06:54:18.8716770Z #34 4.094 creating build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processors 2025-09-07T06:54:18.8717645Z #34 4.094 copying vllm/transformers_utils/processors/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processors 2025-09-07T06:54:18.8718750Z #34 4.094 copying vllm/transformers_utils/processors/deepseek_vl2.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processors 2025-09-07T06:54:18.8719832Z #34 4.095 copying vllm/transformers_utils/processors/ovis.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processors 2025-09-07T06:54:18.8720899Z #34 4.095 copying vllm/transformers_utils/processors/ovis2_5.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processors 2025-09-07T06:54:18.8721772Z #34 4.095 creating build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/tokenizers 2025-09-07T06:54:18.8722634Z #34 4.095 copying vllm/transformers_utils/tokenizers/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/tokenizers 2025-09-07T06:54:18.8723709Z #34 4.096 copying vllm/transformers_utils/tokenizers/mistral.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/tokenizers 2025-09-07T06:54:18.8725042Z #34 4.096 creating build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/speculators 2025-09-07T06:54:18.8726040Z #34 4.096 copying vllm/transformers_utils/configs/speculators/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/speculators 2025-09-07T06:54:18.8727318Z #34 4.096 copying vllm/transformers_utils/configs/speculators/algos.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/speculators 2025-09-07T06:54:18.8728557Z #34 4.097 copying vllm/transformers_utils/configs/speculators/base.py -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/speculators 2025-09-07T06:54:18.8729448Z #34 4.097 creating build/lib.linux-x86_64-cpython-312/vllm/v1/attention 2025-09-07T06:54:18.8730080Z #34 4.097 copying vllm/v1/attention/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention 2025-09-07T06:54:18.8730713Z #34 4.098 creating build/lib.linux-x86_64-cpython-312/vllm/v1/core 2025-09-07T06:54:18.8731290Z #34 4.098 copying vllm/v1/core/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core 2025-09-07T06:54:18.8731968Z #34 4.098 copying vllm/v1/core/block_pool.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core 2025-09-07T06:54:18.8733159Z #34 4.098 copying vllm/v1/core/encoder_cache_manager.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core 2025-09-07T06:54:18.8734047Z #34 4.098 copying vllm/v1/core/kv_cache_coordinator.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core 2025-09-07T06:54:18.8734908Z #34 4.099 copying vllm/v1/core/kv_cache_manager.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core 2025-09-07T06:54:18.8735774Z #34 4.099 copying vllm/v1/core/kv_cache_utils.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core 2025-09-07T06:54:18.8736651Z #34 4.099 copying vllm/v1/core/single_type_kv_cache_manager.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core 2025-09-07T06:54:18.8737425Z #34 4.100 creating build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T06:54:18.8738087Z #34 4.100 copying vllm/v1/engine/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T06:54:18.8738886Z #34 4.100 copying vllm/v1/engine/async_llm.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T06:54:18.8739726Z #34 4.100 copying vllm/v1/engine/coordinator.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T06:54:18.8740532Z #34 4.101 copying vllm/v1/engine/core.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T06:54:18.8741338Z #34 4.101 copying vllm/v1/engine/core_client.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T06:54:18.8742206Z #34 4.101 copying vllm/v1/engine/detokenizer.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T06:54:18.8743057Z #34 4.102 copying vllm/v1/engine/exceptions.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T06:54:18.8743897Z #34 4.102 copying vllm/v1/engine/llm_engine.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T06:54:18.8744710Z #34 4.102 copying vllm/v1/engine/logprobs.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T06:54:18.8745664Z #34 4.102 copying vllm/v1/engine/output_processor.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T06:54:18.8746534Z #34 4.103 copying vllm/v1/engine/parallel_sampling.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T06:54:18.8747384Z #34 4.103 copying vllm/v1/engine/processor.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T06:54:18.8748169Z #34 4.103 copying vllm/v1/engine/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/engine 2025-09-07T06:54:18.8748829Z #34 4.103 creating build/lib.linux-x86_64-cpython-312/vllm/v1/executor 2025-09-07T06:54:18.8749513Z #34 4.104 copying vllm/v1/executor/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/executor 2025-09-07T06:54:18.8750312Z #34 4.104 copying vllm/v1/executor/abstract.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/executor 2025-09-07T06:54:18.8751225Z #34 4.104 copying vllm/v1/executor/multiproc_executor.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/executor 2025-09-07T06:54:18.8752177Z #34 4.104 copying vllm/v1/executor/ray_distributed_executor.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/executor 2025-09-07T06:54:18.8753004Z #34 4.105 creating build/lib.linux-x86_64-cpython-312/vllm/v1/metrics 2025-09-07T06:54:18.8753671Z #34 4.105 copying vllm/v1/metrics/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/metrics 2025-09-07T06:54:18.8754449Z #34 4.105 copying vllm/v1/metrics/loggers.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/metrics 2025-09-07T06:54:18.8755271Z #34 4.105 copying vllm/v1/metrics/prometheus.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/metrics 2025-09-07T06:54:18.8756206Z #34 4.106 copying vllm/v1/metrics/ray_wrappers.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/metrics 2025-09-07T06:54:18.8757176Z #34 4.106 copying vllm/v1/metrics/reader.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/metrics 2025-09-07T06:54:18.8757970Z #34 4.106 copying vllm/v1/metrics/stats.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/metrics 2025-09-07T06:54:18.8758624Z #34 4.107 creating build/lib.linux-x86_64-cpython-312/vllm/v1/pool 2025-09-07T06:54:18.8759259Z #34 4.107 copying vllm/v1/pool/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/pool 2025-09-07T06:54:18.8760000Z #34 4.107 copying vllm/v1/pool/metadata.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/pool 2025-09-07T06:54:18.8760670Z #34 4.107 creating build/lib.linux-x86_64-cpython-312/vllm/v1/sample 2025-09-07T06:54:18.8761334Z #34 4.108 copying vllm/v1/sample/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample 2025-09-07T06:54:18.8762141Z #34 4.108 copying vllm/v1/sample/metadata.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample 2025-09-07T06:54:18.8762992Z #34 4.108 copying vllm/v1/sample/rejection_sampler.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample 2025-09-07T06:54:18.8763831Z #34 4.108 copying vllm/v1/sample/sampler.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample 2025-09-07T06:54:18.8764535Z #34 4.109 creating build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode 2025-09-07T06:54:18.8765266Z #34 4.109 copying vllm/v1/spec_decode/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode 2025-09-07T06:54:18.8766104Z #34 4.109 copying vllm/v1/spec_decode/eagle.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode 2025-09-07T06:54:18.8766952Z #34 4.109 copying vllm/v1/spec_decode/medusa.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode 2025-09-07T06:54:18.8767838Z #34 4.109 copying vllm/v1/spec_decode/metadata.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode 2025-09-07T06:54:18.8768821Z #34 4.110 copying vllm/v1/spec_decode/metrics.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode 2025-09-07T06:54:18.8769693Z #34 4.110 copying vllm/v1/spec_decode/ngram_proposer.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode 2025-09-07T06:54:18.8770541Z #34 4.110 copying vllm/v1/spec_decode/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode 2025-09-07T06:54:18.8771265Z #34 4.110 creating build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output 2025-09-07T06:54:18.8772055Z #34 4.110 copying vllm/v1/structured_output/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output 2025-09-07T06:54:18.8773326Z #34 4.111 copying vllm/v1/structured_output/backend_guidance.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output 2025-09-07T06:54:18.8774506Z #34 4.111 copying vllm/v1/structured_output/backend_lm_format_enforcer.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output 2025-09-07T06:54:18.8775668Z #34 4.111 copying vllm/v1/structured_output/backend_outlines.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output 2025-09-07T06:54:18.8776781Z #34 4.111 copying vllm/v1/structured_output/backend_types.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output 2025-09-07T06:54:18.8777903Z #34 4.112 copying vllm/v1/structured_output/backend_xgrammar.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output 2025-09-07T06:54:18.8778982Z #34 4.112 copying vllm/v1/structured_output/request.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output 2025-09-07T06:54:18.8780031Z #34 4.112 copying vllm/v1/structured_output/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output 2025-09-07T06:54:18.8780809Z #34 4.112 creating build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T06:54:18.8781487Z #34 4.112 copying vllm/v1/worker/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T06:54:18.8782293Z #34 4.113 copying vllm/v1/worker/block_table.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T06:54:18.8783159Z #34 4.113 copying vllm/v1/worker/cpu_model_runner.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T06:54:18.8784010Z #34 4.113 copying vllm/v1/worker/cpu_worker.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T06:54:18.8784857Z #34 4.113 copying vllm/v1/worker/gpu_input_batch.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T06:54:18.8785804Z #34 4.114 copying vllm/v1/worker/gpu_model_runner.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T06:54:18.8786598Z #34 4.114 copying vllm/v1/worker/gpu_worker.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T06:54:18.8787468Z #34 4.114 copying vllm/v1/worker/kv_connector_model_runner_mixin.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T06:54:18.8788402Z #34 4.114 copying vllm/v1/worker/lora_model_runner_mixin.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T06:54:18.8789281Z #34 4.115 copying vllm/v1/worker/tpu_input_batch.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T06:54:18.8790103Z #34 4.115 copying vllm/v1/worker/tpu_model_runner.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T06:54:18.8790896Z #34 4.115 copying vllm/v1/worker/tpu_worker.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T06:54:18.8791659Z #34 4.115 copying vllm/v1/worker/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T06:54:18.8792761Z #34 4.116 copying vllm/v1/worker/worker_base.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T06:54:18.8793616Z #34 4.116 copying vllm/v1/worker/xpu_model_runner.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T06:54:18.8794468Z #34 4.116 copying vllm/v1/worker/xpu_worker.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/worker 2025-09-07T06:54:18.8795212Z #34 4.116 creating build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T06:54:18.8796145Z #34 4.117 copying vllm/v1/attention/backends/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T06:54:18.8797208Z #34 4.117 copying vllm/v1/attention/backends/cpu_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T06:54:18.8798284Z #34 4.117 copying vllm/v1/attention/backends/flash_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T06:54:18.8799400Z #34 4.117 copying vllm/v1/attention/backends/flashinfer.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T06:54:18.8800522Z #34 4.118 copying vllm/v1/attention/backends/flex_attention.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T06:54:18.8801661Z #34 4.118 copying vllm/v1/attention/backends/linear_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T06:54:18.8802777Z #34 4.118 copying vllm/v1/attention/backends/mamba1_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T06:54:18.8803875Z #34 4.118 copying vllm/v1/attention/backends/mamba2_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T06:54:18.8805071Z #34 4.118 copying vllm/v1/attention/backends/mamba_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T06:54:18.8806122Z #34 4.119 copying vllm/v1/attention/backends/pallas.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T06:54:18.8807146Z #34 4.119 copying vllm/v1/attention/backends/rocm_aiter_fa.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T06:54:18.8808252Z #34 4.119 copying vllm/v1/attention/backends/short_conv_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T06:54:18.8809289Z #34 4.119 copying vllm/v1/attention/backends/tree_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T06:54:18.8810325Z #34 4.120 copying vllm/v1/attention/backends/triton_attn.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T06:54:18.8811341Z #34 4.120 copying vllm/v1/attention/backends/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T06:54:18.8812338Z #34 4.120 copying vllm/v1/attention/backends/xformers.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends 2025-09-07T06:54:18.8813441Z #34 4.120 creating build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla 2025-09-07T06:54:18.8814355Z #34 4.120 copying vllm/v1/attention/backends/mla/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla 2025-09-07T06:54:18.8815499Z #34 4.121 copying vllm/v1/attention/backends/mla/common.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla 2025-09-07T06:54:18.8816673Z #34 4.121 copying vllm/v1/attention/backends/mla/cutlass_mla.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla 2025-09-07T06:54:18.8817922Z #34 4.121 copying vllm/v1/attention/backends/mla/flashattn_mla.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla 2025-09-07T06:54:18.8819127Z #34 4.121 copying vllm/v1/attention/backends/mla/flashmla.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla 2025-09-07T06:54:18.8820312Z #34 4.122 copying vllm/v1/attention/backends/mla/rocm_aiter_mla.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla 2025-09-07T06:54:18.8821511Z #34 4.122 copying vllm/v1/attention/backends/mla/triton_mla.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla 2025-09-07T06:54:18.8822410Z #34 4.122 creating build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched 2025-09-07T06:54:18.8823131Z #34 4.122 copying vllm/v1/core/sched/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched 2025-09-07T06:54:18.8824046Z #34 4.122 copying vllm/v1/core/sched/async_scheduler.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched 2025-09-07T06:54:18.8825232Z #34 4.123 copying vllm/v1/core/sched/interface.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched 2025-09-07T06:54:18.8826033Z #34 4.123 copying vllm/v1/core/sched/output.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched 2025-09-07T06:54:18.8826848Z #34 4.123 copying vllm/v1/core/sched/request_queue.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched 2025-09-07T06:54:18.8827658Z #34 4.123 copying vllm/v1/core/sched/scheduler.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched 2025-09-07T06:54:18.8828445Z #34 4.124 copying vllm/v1/core/sched/utils.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched 2025-09-07T06:54:18.8829138Z #34 4.124 creating build/lib.linux-x86_64-cpython-312/vllm/v1/sample/logits_processor 2025-09-07T06:54:18.8829960Z #34 4.124 copying vllm/v1/sample/logits_processor/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/logits_processor 2025-09-07T06:54:18.8830968Z #34 4.124 copying vllm/v1/sample/logits_processor/builtin.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/logits_processor 2025-09-07T06:54:18.8831985Z #34 4.124 copying vllm/v1/sample/logits_processor/interface.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/logits_processor 2025-09-07T06:54:18.8833030Z #34 4.125 copying vllm/v1/sample/logits_processor/state.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/logits_processor 2025-09-07T06:54:18.8833790Z #34 4.125 creating build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops 2025-09-07T06:54:18.8834448Z #34 4.125 copying vllm/v1/sample/ops/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops 2025-09-07T06:54:18.8835269Z #34 4.125 copying vllm/v1/sample/ops/bad_words.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops 2025-09-07T06:54:18.8836062Z #34 4.125 copying vllm/v1/sample/ops/logprobs.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops 2025-09-07T06:54:18.8836881Z #34 4.126 copying vllm/v1/sample/ops/penalties.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops 2025-09-07T06:54:18.8837721Z #34 4.126 copying vllm/v1/sample/ops/topk_topp_sampler.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops 2025-09-07T06:54:18.8838434Z #34 4.126 creating build/lib.linux-x86_64-cpython-312/vllm/v1/sample/tpu 2025-09-07T06:54:18.8839086Z #34 4.126 copying vllm/v1/sample/tpu/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/tpu 2025-09-07T06:54:19.0086423Z #34 4.127 copying vllm/v1/sample/tpu/metadata.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/tpu 2025-09-07T06:54:19.0087375Z #34 4.127 copying vllm/v1/sample/tpu/sampler.py -> build/lib.linux-x86_64-cpython-312/vllm/v1/sample/tpu 2025-09-07T06:54:19.0088015Z #34 4.127 running egg_info 2025-09-07T06:54:19.0088327Z #34 4.139 creating vllm.egg-info 2025-09-07T06:54:19.0088660Z #34 4.139 writing vllm.egg-info/PKG-INFO 2025-09-07T06:54:19.0089146Z #34 4.141 writing dependency_links to vllm.egg-info/dependency_links.txt 2025-09-07T06:54:19.0089717Z #34 4.142 writing entry points to vllm.egg-info/entry_points.txt 2025-09-07T06:54:19.0090415Z #34 4.146 writing requirements to vllm.egg-info/requires.txt 2025-09-07T06:54:19.0090917Z #34 4.146 writing top-level names to vllm.egg-info/top_level.txt 2025-09-07T06:54:19.0091428Z #34 4.146 writing manifest file 'vllm.egg-info/SOURCES.txt' 2025-09-07T06:54:19.2726162Z #34 4.560 reading manifest template 'MANIFEST.in' 2025-09-07T06:54:19.3727621Z #34 4.567 adding license file 'LICENSE' 2025-09-07T06:54:19.3728179Z #34 4.598 writing manifest file 'vllm.egg-info/SOURCES.txt' 2025-09-07T06:54:19.3728792Z #34 4.629 copying vllm/py.typed -> build/lib.linux-x86_64-cpython-312/vllm 2025-09-07T06:54:19.3729534Z #34 4.629 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3730852Z #34 4.629 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3733076Z #34 4.629 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3734882Z #34 4.630 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3736650Z #34 4.630 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3738436Z #34 4.630 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3740282Z #34 4.630 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3742026Z #34 4.631 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3743855Z #34 4.631 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3745806Z #34 4.631 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3747522Z #34 4.631 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3749246Z #34 4.632 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3750958Z #34 4.632 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=1024,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3752657Z #34 4.632 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=1024,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3754311Z #34 4.632 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3756027Z #34 4.633 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3757618Z #34 4.633 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H20-3e.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3759190Z #34 4.633 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3760754Z #34 4.633 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3762429Z #34 4.633 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=352,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3764268Z #34 4.634 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3766159Z #34 4.634 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3767977Z #34 4.634 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3769687Z #34 4.634 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20-3e.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3771265Z #34 4.635 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3773217Z #34 4.635 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3775025Z #34 4.635 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3776717Z #34 4.635 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=512,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3778424Z #34 4.636 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=704,device_name=NVIDIA_B200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3780209Z #34 4.636 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=704,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3782115Z #34 4.636 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3784054Z #34 4.636 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3786055Z #34 4.636 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3787893Z #34 4.637 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3789586Z #34 4.637 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3791281Z #34 4.637 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3793349Z #34 4.637 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3795006Z #34 4.638 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=96,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3796657Z #34 4.638 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3798372Z #34 4.638 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_B200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3800040Z #34 4.638 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_B200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3801659Z #34 4.639 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_H100.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3803318Z #34 4.639 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3805149Z #34 4.639 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3806792Z #34 4.639 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3808555Z #34 4.639 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3810284Z #34 4.640 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3812018Z #34 4.640 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3814013Z #34 4.640 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3815702Z #34 4.640 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3817445Z #34 4.641 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3819210Z #34 4.641 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3821039Z #34 4.641 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3822861Z #34 4.641 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3200,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3824837Z #34 4.642 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3826555Z #34 4.642 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3828251Z #34 4.642 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=6400,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3830027Z #34 4.642 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3831763Z #34 4.643 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3833473Z #34 4.643 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3835236Z #34 4.643 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=800,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3836983Z #34 4.643 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=192,device_name=NVIDIA_A800-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3838612Z #34 4.643 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=192,device_name=NVIDIA_H20-3e.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3840199Z #34 4.644 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=320,device_name=NVIDIA_H20-3e.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3841898Z #34 4.644 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3843761Z #34 4.644 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3845606Z #34 4.644 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3847473Z #34 4.645 copying vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3849299Z #34 4.645 copying vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3851142Z #34 4.645 copying vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3853237Z #34 4.645 copying vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3855159Z #34 4.645 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=1024,device_name=AMD_Instinct_MI325X,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3857093Z #34 4.646 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=1024,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3859110Z #34 4.646 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3861029Z #34 4.646 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3862962Z #34 4.646 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3864998Z #34 4.647 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3866880Z #34 4.647 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3868785Z #34 4.647 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3870626Z #34 4.647 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3872493Z #34 4.648 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3874410Z #34 4.648 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3876360Z #34 4.648 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3878289Z #34 4.648 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3880117Z #34 4.649 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3881954Z #34 4.649 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3883799Z #34 4.649 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3885658Z #34 4.649 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3887497Z #34 4.650 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3889378Z #34 4.650 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3891166Z #34 4.650 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=64,device_name=NVIDIA_A800-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3893283Z #34 4.650 copying vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3895185Z #34 4.651 copying vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3897142Z #34 4.651 copying vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3898914Z #34 4.651 copying vllm/model_executor/layers/fused_moe/configs/E=60,N=1408,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3900635Z #34 4.651 copying vllm/model_executor/layers/fused_moe/configs/E=60,N=176,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3902315Z #34 4.651 copying vllm/model_executor/layers/fused_moe/configs/E=60,N=352,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3903977Z #34 4.652 copying vllm/model_executor/layers/fused_moe/configs/E=60,N=704,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3905741Z #34 4.652 copying vllm/model_executor/layers/fused_moe/configs/E=62,N=256,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3907378Z #34 4.652 copying vllm/model_executor/layers/fused_moe/configs/E=62,N=512,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3909050Z #34 4.652 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3910700Z #34 4.653 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A800-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3912408Z #34 4.653 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3914098Z #34 4.653 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3915759Z #34 4.653 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3917431Z #34 4.654 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3919043Z #34 4.654 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1536,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3920754Z #34 4.654 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3922474Z #34 4.654 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3924091Z #34 4.654 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3925709Z #34 4.655 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=3072,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3927352Z #34 4.655 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=3072,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3928984Z #34 4.655 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3930702Z #34 4.655 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3932351Z #34 4.656 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3934235Z #34 4.656 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3935895Z #34 4.656 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3937553Z #34 4.656 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=384,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3939220Z #34 4.656 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3940915Z #34 4.657 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A800-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3942681Z #34 4.657 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3944479Z #34 4.657 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3946288Z #34 4.657 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3947973Z #34 4.658 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3949563Z #34 4.658 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3951170Z #34 4.658 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3952778Z #34 4.658 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3954305Z #34 4.659 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=896,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3955897Z #34 4.659 copying vllm/model_executor/layers/fused_moe/configs/E=72,N=384,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3957554Z #34 4.659 copying vllm/model_executor/layers/fused_moe/configs/E=72,N=768,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3959242Z #34 4.659 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3960970Z #34 4.660 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3962670Z #34 4.660 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3964352Z #34 4.660 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3966054Z #34 4.660 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3967783Z #34 4.660 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.3969428Z #34 4.661 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4727966Z #34 4.661 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4729941Z #34 4.661 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4731661Z #34 4.661 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4734069Z #34 4.662 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4735992Z #34 4.662 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4737755Z #34 4.662 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4739624Z #34 4.662 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4741721Z #34 4.662 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4743421Z #34 4.663 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4745216Z #34 4.663 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4747160Z #34 4.663 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4749073Z #34 4.663 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4750970Z #34 4.664 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4752777Z #34 4.664 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4754632Z #34 4.664 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4756719Z #34 4.664 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4758666Z #34 4.664 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4771663Z #34 4.665 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4773820Z #34 4.665 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4775796Z #34 4.665 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4777711Z #34 4.665 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4779703Z #34 4.666 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4781640Z #34 4.666 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4783532Z #34 4.666 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4785459Z #34 4.666 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4787479Z #34 4.667 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4789441Z #34 4.667 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4791411Z #34 4.667 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4793551Z #34 4.667 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4795334Z #34 4.668 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4797392Z #34 4.668 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4799294Z #34 4.668 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4801110Z #34 4.668 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4802909Z #34 4.668 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4804966Z #34 4.669 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_L40S.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4806983Z #34 4.669 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4808869Z #34 4.669 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4810574Z #34 4.669 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4812581Z #34 4.670 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4814965Z #34 4.670 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4817203Z #34 4.670 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4819204Z #34 4.670 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4821077Z #34 4.671 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4822758Z #34 4.671 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4824615Z #34 4.671 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4826555Z #34 4.671 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4828552Z #34 4.672 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4830398Z #34 4.672 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4832396Z #34 4.672 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4834288Z #34 4.672 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4836217Z #34 4.672 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4837963Z #34 4.673 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4839974Z #34 4.673 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4842176Z #34 4.673 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4844222Z #34 4.673 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4846085Z #34 4.674 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4847879Z #34 4.674 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4849686Z #34 4.674 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4851673Z #34 4.674 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.4853384Z #34 4.675 creating build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4854988Z #34 4.675 copying vllm/model_executor/layers/quantization/utils/configs/N=12288,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4857543Z #34 4.675 copying vllm/model_executor/layers/quantization/utils/configs/N=12288,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4860070Z #34 4.675 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4862457Z #34 4.675 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4865072Z #34 4.676 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4867661Z #34 4.676 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4870154Z #34 4.676 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4872541Z #34 4.676 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4874839Z #34 4.676 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4877156Z #34 4.677 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4879706Z #34 4.677 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4882203Z #34 4.677 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4884603Z #34 4.677 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4887030Z #34 4.678 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4889558Z #34 4.678 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4892054Z #34 4.678 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4894801Z #34 4.678 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4897114Z #34 4.679 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4899674Z #34 4.679 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4901964Z #34 4.679 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4904421Z #34 4.679 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4906926Z #34 4.679 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4909304Z #34 4.680 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4911776Z #34 4.680 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4914233Z #34 4.680 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4916499Z #34 4.680 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4918747Z #34 4.681 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4920953Z #34 4.681 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4923209Z #34 4.681 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4925556Z #34 4.681 copying vllm/model_executor/layers/quantization/utils/configs/N=2112,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4927845Z #34 4.682 copying vllm/model_executor/layers/quantization/utils/configs/N=2112,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4930285Z #34 4.682 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4932724Z #34 4.682 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4935376Z #34 4.682 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4937806Z #34 4.682 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4940354Z #34 4.683 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4942875Z #34 4.683 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4945458Z #34 4.683 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4947820Z #34 4.683 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4950144Z #34 4.684 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4952488Z #34 4.684 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=1536,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4954984Z #34 4.684 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=1536,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4957178Z #34 4.684 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4959354Z #34 4.685 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4961659Z #34 4.685 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4964163Z #34 4.685 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4966746Z #34 4.685 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4969071Z #34 4.685 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4971436Z #34 4.686 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4974303Z #34 4.686 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4976860Z #34 4.686 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4979449Z #34 4.686 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4981796Z #34 4.687 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4984483Z #34 4.687 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4986939Z #34 4.687 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4989487Z #34 4.687 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4992079Z #34 4.688 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4994931Z #34 4.688 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4997507Z #34 4.688 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.4999898Z #34 4.688 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5002185Z #34 4.688 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5004791Z #34 4.689 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5007007Z #34 4.689 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5009345Z #34 4.689 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5011843Z #34 4.689 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5014658Z #34 4.690 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5017147Z #34 4.690 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5019504Z #34 4.690 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5021889Z #34 4.690 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5024162Z #34 4.691 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5026473Z #34 4.691 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5028805Z #34 4.691 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5031165Z #34 4.691 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5033472Z #34 4.691 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5035758Z #34 4.692 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5038137Z #34 4.692 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5040540Z #34 4.692 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5042718Z #34 4.692 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5044920Z #34 4.693 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5047093Z #34 4.693 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5049293Z #34 4.693 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5051701Z #34 4.693 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5054821Z #34 4.694 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5057200Z #34 4.694 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5059507Z #34 4.694 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5062002Z #34 4.694 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5064393Z #34 4.694 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5066873Z #34 4.695 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5068953Z #34 4.695 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5071266Z #34 4.695 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5073833Z #34 4.695 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5076249Z #34 4.696 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5078815Z #34 4.696 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5081200Z #34 4.696 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5083532Z #34 4.696 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5085854Z #34 4.697 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5088499Z #34 4.698 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5090683Z #34 4.698 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5093513Z #34 4.698 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5095893Z #34 4.698 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5098283Z #34 4.699 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5100870Z #34 4.699 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5103445Z #34 4.699 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5105742Z #34 4.700 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5108019Z #34 4.700 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5110534Z #34 4.700 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5113191Z #34 4.701 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5115959Z #34 4.701 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5118527Z #34 4.701 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5120814Z #34 4.701 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5123081Z #34 4.702 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5125410Z #34 4.702 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5127953Z #34 4.702 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5130229Z #34 4.702 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5132990Z #34 4.703 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5135322Z #34 4.703 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5137652Z #34 4.703 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5139896Z #34 4.703 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5142208Z #34 4.703 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5144557Z #34 4.704 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5146834Z #34 4.704 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5149210Z #34 4.704 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5151855Z #34 4.704 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5154316Z #34 4.705 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5156796Z #34 4.705 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5159172Z #34 4.705 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5161648Z #34 4.705 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5164024Z #34 4.706 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5166453Z #34 4.706 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5168697Z #34 4.706 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5171088Z #34 4.706 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5173599Z #34 4.706 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5176057Z #34 4.707 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5178589Z #34 4.707 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5180952Z #34 4.707 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5183516Z #34 4.708 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5185978Z #34 4.708 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5188351Z #34 4.708 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5190529Z #34 4.708 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5193182Z #34 4.709 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5195624Z #34 4.709 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5198196Z #34 4.709 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5200837Z #34 4.709 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5203275Z #34 4.710 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5205879Z #34 4.710 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5208297Z #34 4.710 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5210574Z #34 4.710 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5213260Z #34 4.710 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5215916Z #34 4.711 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5218411Z #34 4.711 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5220855Z #34 4.711 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5223649Z #34 4.711 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5226332Z #34 4.712 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5228733Z #34 4.712 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5231079Z #34 4.712 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5233411Z #34 4.712 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5235685Z #34 4.713 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5237954Z #34 4.713 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5240566Z #34 4.713 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5243081Z #34 4.713 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5245676Z #34 4.714 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5248063Z #34 4.714 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5250393Z #34 4.714 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5253175Z #34 4.714 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5255682Z #34 4.714 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5258017Z #34 4.715 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5260292Z #34 4.715 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5262774Z #34 4.715 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5265228Z #34 4.715 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5267364Z #34 4.716 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5269609Z #34 4.716 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5272187Z #34 4.716 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5274674Z #34 4.716 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5277175Z #34 4.717 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5279519Z #34 4.717 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5281937Z #34 4.717 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5284070Z #34 4.718 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5286158Z #34 4.718 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5288474Z #34 4.718 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5291064Z #34 4.718 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5293867Z #34 4.718 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5296344Z #34 4.719 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5298956Z #34 4.719 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5301203Z #34 4.719 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5303581Z #34 4.719 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5306025Z #34 4.720 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5308404Z #34 4.720 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5310737Z #34 4.720 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5313175Z #34 4.720 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5315623Z #34 4.721 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5318104Z #34 4.721 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5320512Z #34 4.721 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5322783Z #34 4.721 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5325134Z #34 4.721 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5327337Z #34 4.722 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5329565Z #34 4.722 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5332064Z #34 4.722 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5334502Z #34 4.722 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5337019Z #34 4.723 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5339255Z #34 4.723 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5341659Z #34 4.723 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5343980Z #34 4.723 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5346372Z #34 4.724 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5348453Z #34 4.724 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5350762Z #34 4.724 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5353413Z #34 4.724 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5355605Z #34 4.724 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5358071Z #34 4.725 copying vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5360422Z #34 4.725 copying vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5362891Z #34 4.725 copying vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.5364402Z #34 4.725 creating build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn 2025-09-07T06:54:19.5365283Z #34 4.726 copying vllm/vllm_flash_attn/.gitkeep -> build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn 2025-09-07T06:54:19.5366248Z #34 4.726 copying vllm/distributed/kv_transfer/README.md -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer 2025-09-07T06:54:19.5367726Z #34 4.726 copying vllm/distributed/kv_transfer/disagg_prefill_workflow.jpg -> build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer 2025-09-07T06:54:19.5369521Z #34 4.726 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5371572Z #34 4.727 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5374006Z #34 4.727 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5376151Z #34 4.727 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5378320Z #34 4.727 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5380330Z #34 4.728 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5382261Z #34 4.728 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5384039Z #34 4.728 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5386080Z #34 4.728 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5388121Z #34 4.729 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5390081Z #34 4.729 copying vllm/model_executor/layers/fused_moe/configs/E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5391861Z #34 4.729 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=1024,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5395514Z #34 4.730 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=1024,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5397800Z #34 4.730 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5399499Z #34 4.730 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5401277Z #34 4.730 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H20-3e.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5402946Z #34 4.731 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5404664Z #34 4.731 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5406334Z #34 4.731 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=352,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5408186Z #34 4.731 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5410058Z #34 4.732 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5411964Z #34 4.732 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5413972Z #34 4.732 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20-3e.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5415586Z #34 4.732 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5417346Z #34 4.733 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5419111Z #34 4.733 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5420815Z #34 4.733 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=512,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5422589Z #34 4.733 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=704,device_name=NVIDIA_B200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5424376Z #34 4.734 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=704,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5426345Z #34 4.734 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5428219Z #34 4.734 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5430024Z #34 4.734 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5431894Z #34 4.735 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5433613Z #34 4.735 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5435301Z #34 4.735 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5437004Z #34 4.736 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5438553Z #34 4.736 copying vllm/model_executor/layers/fused_moe/configs/E=128,N=96,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5440141Z #34 4.736 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5441801Z #34 4.736 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_B200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5443460Z #34 4.737 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_B200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5445024Z #34 4.737 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_H100.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5446631Z #34 4.737 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5448275Z #34 4.737 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5449946Z #34 4.738 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5451671Z #34 4.738 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5453675Z #34 4.738 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5455460Z #34 4.738 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5457244Z #34 4.739 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5458933Z #34 4.739 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5460625Z #34 4.739 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5462435Z #34 4.739 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5464288Z #34 4.740 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5466191Z #34 4.740 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3200,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5467966Z #34 4.740 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5469700Z #34 4.740 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5471395Z #34 4.741 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=6400,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5473171Z #34 4.741 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5474925Z #34 4.741 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5476632Z #34 4.741 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5478394Z #34 4.742 copying vllm/model_executor/layers/fused_moe/configs/E=16,N=800,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5480098Z #34 4.742 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=192,device_name=NVIDIA_A800-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5481730Z #34 4.742 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=192,device_name=NVIDIA_H20-3e.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5483302Z #34 4.742 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=320,device_name=NVIDIA_H20-3e.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5485010Z #34 4.743 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5486841Z #34 4.743 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5488687Z #34 4.743 copying vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5490522Z #34 4.743 copying vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5492765Z #34 4.744 copying vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5494720Z #34 4.744 copying vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5496617Z #34 4.744 copying vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5498491Z #34 4.744 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=1024,device_name=AMD_Instinct_MI325X,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5500422Z #34 4.745 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=1024,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5502447Z #34 4.745 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5504408Z #34 4.745 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5506388Z #34 4.745 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5508250Z #34 4.746 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5510092Z #34 4.746 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5511998Z #34 4.746 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5513826Z #34 4.746 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5515709Z #34 4.747 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5517614Z #34 4.747 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5519527Z #34 4.747 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5521407Z #34 4.747 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5523268Z #34 4.748 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5525119Z #34 4.748 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5526961Z #34 4.748 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5528785Z #34 4.749 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5530611Z #34 4.749 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5532563Z #34 4.749 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5534589Z #34 4.749 copying vllm/model_executor/layers/fused_moe/configs/E=256,N=64,device_name=NVIDIA_A800-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5536388Z #34 4.750 copying vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5538287Z #34 4.750 copying vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5540163Z #34 4.750 copying vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5541951Z #34 4.750 copying vllm/model_executor/layers/fused_moe/configs/E=60,N=1408,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5543659Z #34 4.751 copying vllm/model_executor/layers/fused_moe/configs/E=60,N=176,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5545432Z #34 4.751 copying vllm/model_executor/layers/fused_moe/configs/E=60,N=352,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5547052Z #34 4.751 copying vllm/model_executor/layers/fused_moe/configs/E=60,N=704,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5548686Z #34 4.751 copying vllm/model_executor/layers/fused_moe/configs/E=62,N=256,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5550305Z #34 4.752 copying vllm/model_executor/layers/fused_moe/configs/E=62,N=512,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5551962Z #34 4.752 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5553644Z #34 4.752 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A800-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5555340Z #34 4.752 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5557089Z #34 4.753 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5558766Z #34 4.753 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5560384Z #34 4.753 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5562003Z #34 4.753 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=1536,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5563731Z #34 4.754 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5565472Z #34 4.754 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5567108Z #34 4.754 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5568741Z #34 4.754 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=3072,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5570347Z #34 4.755 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=3072,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5572008Z #34 4.755 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5574009Z #34 4.755 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5575710Z #34 4.755 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5577379Z #34 4.756 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5579051Z #34 4.756 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5580691Z #34 4.756 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=384,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5582335Z #34 4.756 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5584064Z #34 4.757 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A800-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5585905Z #34 4.757 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5587710Z #34 4.757 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5589412Z #34 4.757 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5591059Z #34 4.758 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5593013Z #34 4.758 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5594677Z #34 4.758 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5596380Z #34 4.758 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5597977Z #34 4.759 copying vllm/model_executor/layers/fused_moe/configs/E=64,N=896,device_name=NVIDIA_H20.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5599616Z #34 4.759 copying vllm/model_executor/layers/fused_moe/configs/E=72,N=384,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5601283Z #34 4.759 copying vllm/model_executor/layers/fused_moe/configs/E=72,N=768,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5603049Z #34 4.759 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5605040Z #34 4.760 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5606692Z #34 4.760 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5608346Z #34 4.760 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5610001Z #34 4.760 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.5611670Z #34 4.761 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7091347Z #34 4.761 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7093930Z #34 4.761 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7095714Z #34 4.762 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7097580Z #34 4.762 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7099339Z #34 4.762 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7101099Z #34 4.762 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7102863Z #34 4.763 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7104596Z #34 4.763 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7106457Z #34 4.763 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7108194Z #34 4.763 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7109789Z #34 4.764 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7111383Z #34 4.764 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7112991Z #34 4.764 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7114599Z #34 4.764 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7116214Z #34 4.765 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7117849Z #34 4.765 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7119485Z #34 4.765 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7121127Z #34 4.765 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7122724Z #34 4.766 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7124398Z #34 4.766 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7126064Z #34 4.766 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7127749Z #34 4.766 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7129440Z #34 4.767 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7131005Z #34 4.767 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7132687Z #34 4.767 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7134586Z #34 4.768 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7136383Z #34 4.768 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7138129Z #34 4.768 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7139810Z #34 4.768 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-40GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7141504Z #34 4.769 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7143309Z #34 4.769 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7145229Z #34 4.769 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7146870Z #34 4.769 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7148475Z #34 4.770 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7150034Z #34 4.770 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7151555Z #34 4.770 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_L40S.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7153160Z #34 4.771 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7154847Z #34 4.771 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7156515Z #34 4.771 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7158158Z #34 4.771 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7159749Z #34 4.772 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7161391Z #34 4.772 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7163035Z #34 4.772 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7164641Z #34 4.772 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7166226Z #34 4.773 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7167827Z #34 4.773 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7169471Z #34 4.773 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7171096Z #34 4.773 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7173003Z #34 4.774 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7174707Z #34 4.774 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7176454Z #34 4.774 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7178211Z #34 4.774 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7179921Z #34 4.775 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7181580Z #34 4.775 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H200.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7183280Z #34 4.775 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7185052Z #34 4.775 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI300X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7186820Z #34 4.776 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7188462Z #34 4.776 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI325X.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7190110Z #34 4.776 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7191775Z #34 4.776 copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7193797Z #34 4.777 copying vllm/model_executor/layers/fused_moe/configs/README -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs 2025-09-07T06:54:19.7195623Z #34 4.777 copying vllm/model_executor/layers/quantization/utils/configs/N=12288,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7197906Z #34 4.777 copying vllm/model_executor/layers/quantization/utils/configs/N=12288,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7200127Z #34 4.777 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7202368Z #34 4.778 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7204747Z #34 4.778 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7207007Z #34 4.778 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7209178Z #34 4.779 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7211286Z #34 4.779 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7213619Z #34 4.779 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7215781Z #34 4.779 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7218028Z #34 4.780 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7220281Z #34 4.780 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7222522Z #34 4.780 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7224888Z #34 4.780 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7227055Z #34 4.781 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7229264Z #34 4.781 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7231382Z #34 4.781 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7233598Z #34 4.781 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7235487Z #34 4.782 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7237427Z #34 4.782 copying vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7239370Z #34 4.782 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7241515Z #34 4.783 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7243614Z #34 4.783 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7245736Z #34 4.783 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7247868Z #34 4.783 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7249964Z #34 4.784 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7252049Z #34 4.784 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7254349Z #34 4.784 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7256489Z #34 4.784 copying vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7258691Z #34 4.785 copying vllm/model_executor/layers/quantization/utils/configs/N=2112,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7260921Z #34 4.785 copying vllm/model_executor/layers/quantization/utils/configs/N=2112,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7263098Z #34 4.785 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7265417Z #34 4.785 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7267401Z #34 4.786 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7269428Z #34 4.786 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7271439Z #34 4.786 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7273432Z #34 4.786 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7275377Z #34 4.787 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7277273Z #34 4.787 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7279213Z #34 4.787 copying vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7281189Z #34 4.787 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=1536,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7283134Z #34 4.788 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=1536,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7285086Z #34 4.788 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7287067Z #34 4.788 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7289065Z #34 4.789 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7291081Z #34 4.789 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7293578Z #34 4.789 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7295804Z #34 4.790 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7298060Z #34 4.790 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7300251Z #34 4.790 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7302418Z #34 4.790 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7304670Z #34 4.791 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7306843Z #34 4.791 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7308763Z #34 4.791 copying vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7310749Z #34 4.791 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7312744Z #34 4.792 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7314729Z #34 4.792 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7316721Z #34 4.792 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7318942Z #34 4.792 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7321079Z #34 4.793 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7323129Z #34 4.793 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7325140Z #34 4.793 copying vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7327183Z #34 4.793 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7329302Z #34 4.794 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7331423Z #34 4.794 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7333664Z #34 4.794 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7335824Z #34 4.794 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7337984Z #34 4.795 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7340123Z #34 4.795 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7342337Z #34 4.795 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7344598Z #34 4.795 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7346863Z #34 4.796 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7348805Z #34 4.796 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7350757Z #34 4.796 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7352735Z #34 4.796 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7354900Z #34 4.797 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7356921Z #34 4.797 copying vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7358969Z #34 4.797 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7361081Z #34 4.798 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7363190Z #34 4.798 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7365495Z #34 4.798 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7367687Z #34 4.798 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7369811Z #34 4.799 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7371961Z #34 4.799 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7374401Z #34 4.799 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7376563Z #34 4.799 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7378728Z #34 4.800 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7380890Z #34 4.800 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7383057Z #34 4.800 copying vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7385488Z #34 4.800 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7387808Z #34 4.801 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7389982Z #34 4.801 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7392510Z #34 4.801 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7394794Z #34 4.801 copying vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7396997Z #34 4.802 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7399209Z #34 4.802 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7401447Z #34 4.802 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7403645Z #34 4.802 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7406036Z #34 4.803 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7408113Z #34 4.803 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7410136Z #34 4.803 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7412148Z #34 4.803 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7414527Z #34 4.804 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7416721Z #34 4.804 copying vllm/model_executor/layers/quantization/utils/configs/N=4096,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7418956Z #34 4.804 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7421191Z #34 4.804 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7423414Z #34 4.805 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7425731Z #34 4.805 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7427798Z #34 4.805 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7429854Z #34 4.805 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7431878Z #34 4.806 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7433894Z #34 4.806 copying vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7435931Z #34 4.806 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7438053Z #34 4.807 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7440173Z #34 4.807 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7442232Z #34 4.807 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7444282Z #34 4.807 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7446280Z #34 4.808 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7448311Z #34 4.808 copying vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7450360Z #34 4.808 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7452491Z #34 4.809 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7454874Z #34 4.809 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7457159Z #34 4.809 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7459399Z #34 4.809 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7461594Z #34 4.810 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7463778Z #34 4.810 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7466014Z #34 4.810 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7468010Z #34 4.810 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7470044Z #34 4.811 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7472080Z #34 4.811 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7474082Z #34 4.811 copying vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7476217Z #34 4.811 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7478198Z #34 4.812 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7480210Z #34 4.812 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7482195Z #34 4.812 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7484213Z #34 4.812 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7486207Z #34 4.813 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7488164Z #34 4.813 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7490078Z #34 4.813 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7492109Z #34 4.813 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7494491Z #34 4.814 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7496728Z #34 4.814 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7499037Z #34 4.814 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7501346Z #34 4.814 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7503596Z #34 4.815 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7505946Z #34 4.815 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7507885Z #34 4.815 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7509783Z #34 4.816 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7511728Z #34 4.816 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7513663Z #34 4.816 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7515630Z #34 4.816 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7517768Z #34 4.817 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7519767Z #34 4.817 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7521981Z #34 4.817 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7524073Z #34 4.817 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7526128Z #34 4.818 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7528143Z #34 4.818 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7530441Z #34 4.818 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7532714Z #34 4.818 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7535134Z #34 4.819 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7537402Z #34 4.819 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7539663Z #34 4.819 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7541890Z #34 4.819 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7544142Z #34 4.820 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7546555Z #34 4.820 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7548677Z #34 4.820 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7550752Z #34 4.820 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7552971Z #34 4.821 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7555075Z #34 4.821 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7557210Z #34 4.821 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7559376Z #34 4.821 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7561544Z #34 4.822 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7563760Z #34 4.822 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7565987Z #34 4.822 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7568201Z #34 4.822 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7570270Z #34 4.823 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7572339Z #34 4.823 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7574762Z #34 4.823 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7576918Z #34 4.823 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7579077Z #34 4.824 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7581235Z #34 4.824 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7583455Z #34 4.824 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7585761Z #34 4.825 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7587877Z #34 4.825 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7589941Z #34 4.825 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7592125Z #34 4.825 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7594507Z #34 4.826 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7596665Z #34 4.826 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7598850Z #34 4.826 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7601043Z #34 4.826 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7603269Z #34 4.827 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7605540Z #34 4.827 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7607533Z #34 4.827 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7609485Z #34 4.827 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7611437Z #34 4.828 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7613642Z #34 4.828 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7615844Z #34 4.828 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7618030Z #34 4.829 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7620235Z #34 4.829 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7622460Z #34 4.829 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7624656Z #34 4.829 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7626863Z #34 4.830 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7628783Z #34 4.830 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7630685Z #34 4.830 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7632616Z #34 4.830 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7634583Z #34 4.831 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7636571Z #34 4.831 copying vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7638579Z #34 4.831 copying vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7640548Z #34 4.831 copying vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7642533Z #34 4.832 copying vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7644266Z #34 4.832 copying vllm/model_executor/layers/quantization/utils/configs/README.md -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T06:54:19.7645494Z #34 4.832 copying vllm/plugins/lora_resolvers/README.md -> build/lib.linux-x86_64-cpython-312/vllm/plugins/lora_resolvers 2025-09-07T06:54:19.7646595Z #34 4.832 copying vllm/transformers_utils/chat_templates/template_basic.jinja -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates 2025-09-07T06:54:19.7647845Z #34 4.833 copying vllm/transformers_utils/chat_templates/template_blip2.jinja -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates 2025-09-07T06:54:19.7649080Z #34 4.833 copying vllm/transformers_utils/chat_templates/template_chatml.jinja -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates 2025-09-07T06:54:19.7650363Z #34 4.833 copying vllm/transformers_utils/chat_templates/template_deepseek_vl2.jinja -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates 2025-09-07T06:54:19.7651627Z #34 4.833 copying vllm/transformers_utils/chat_templates/template_fuyu.jinja -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates 2025-09-07T06:54:19.7653131Z #34 4.833 copying vllm/transformers_utils/chat_templates/template_minicpmv45.jinja -> build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates 2025-09-07T06:54:19.7654039Z #34 4.846 running build_ext 2025-09-07T06:54:19.7654419Z #34 5.041 Using MAX_JOBS=42 as the number of jobs. 2025-09-07T06:54:19.9060844Z #34 5.043 Using NVCC_THREADS=4 as the number of nvcc threads. 2025-09-07T06:54:20.0383441Z #34 5.326 -- The CXX compiler identification is GNU 13.3.1 2025-09-07T06:54:20.1617217Z #34 5.340 -- Detecting CXX compiler ABI info 2025-09-07T06:54:20.1618018Z #34 5.450 -- Detecting CXX compiler ABI info - done 2025-09-07T06:54:20.3466838Z #34 5.469 -- Check for working CXX compiler: /opt/rh/gcc-toolset-13/root/usr/bin/c++ - skipped 2025-09-07T06:54:20.3467471Z #34 5.469 -- Detecting CXX compile features 2025-09-07T06:54:20.3467874Z #34 5.470 -- Detecting CXX compile features - done 2025-09-07T06:54:20.3468278Z #34 5.484 -- Build type: Release 2025-09-07T06:54:20.3468613Z #34 5.484 -- Target device: cuda 2025-09-07T06:54:20.3534612Z #34 5.641 -- Found Python: /opt/python/cp312-cp312/bin/python3 (found version "3.12.11") found components: Interpreter Development.Module Development.SABIModule 2025-09-07T06:54:20.5041207Z #34 5.641 -- Found python matching: /opt/python/cp312-cp312/bin/python3. 2025-09-07T06:54:22.2143376Z #34 7.502 -- Found CUDA: /usr/local/cuda (found version "12.8") 2025-09-07T06:54:23.2878760Z #34 8.575 -- The CUDA compiler identification is NVIDIA 12.8.93 with host compiler GNU 13.3.1 2025-09-07T06:54:23.4496791Z #34 8.587 -- Detecting CUDA compiler ABI info 2025-09-07T06:54:24.2973380Z #34 9.585 -- Detecting CUDA compiler ABI info - done 2025-09-07T06:54:24.4915327Z #34 9.646 -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped 2025-09-07T06:54:24.4915968Z #34 9.650 -- Detecting CUDA compile features 2025-09-07T06:54:24.4916368Z #34 9.651 -- Detecting CUDA compile features - done 2025-09-07T06:54:24.4917121Z #34 9.665 -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.8.93") 2025-09-07T06:54:24.4917668Z #34 9.675 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD 2025-09-07T06:54:24.4918138Z #34 9.779 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed 2025-09-07T06:54:24.4918612Z #34 9.779 -- Looking for pthread_create in pthreads 2025-09-07T06:54:24.6581485Z #34 9.846 -- Looking for pthread_create in pthreads - not found 2025-09-07T06:54:24.6582054Z #34 9.846 -- Looking for pthread_create in pthread 2025-09-07T06:54:24.6582520Z #34 9.946 -- Looking for pthread_create in pthread - found 2025-09-07T06:54:24.8888953Z #34 9.947 -- Found Threads: TRUE 2025-09-07T06:54:24.8889395Z #34 10.04 -- PyTorch: CUDA detected: 12.8 2025-09-07T06:54:24.8889821Z #34 10.04 -- PyTorch: CUDA nvcc is: /usr/local/cuda/bin/nvcc 2025-09-07T06:54:24.8890303Z #34 10.04 -- PyTorch: CUDA toolkit directory: /usr/local/cuda 2025-09-07T06:54:24.8890733Z #34 10.18 -- PyTorch: Header version is: 12.8 2025-09-07T06:54:25.0855836Z #34 10.20 -- Found Python: /opt/python/cp312-cp312/bin/python3 (found version "3.12.11") found components: Interpreter 2025-09-07T06:54:25.0857087Z #34 10.20 CMake Warning at /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:140 (message): 2025-09-07T06:54:25.0857955Z #34 10.20 Failed to compute shorthash for libnvrtc.so 2025-09-07T06:54:25.0858395Z #34 10.20 Call Stack (most recent call first): 2025-09-07T06:54:25.0859169Z #34 10.20 /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include) 2025-09-07T06:54:25.0860292Z #34 10.20 /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) 2025-09-07T06:54:25.0861062Z #34 10.20 CMakeLists.txt:80 (find_package) 2025-09-07T06:54:25.0861396Z #34 10.20 2025-09-07T06:54:25.0861625Z #34 10.20 2025-09-07T06:54:25.0861940Z #34 10.20 -- USE_CUDNN is set to 0. Compiling without cuDNN support 2025-09-07T06:54:25.0862525Z #34 10.20 -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support 2025-09-07T06:54:25.0863104Z #34 10.20 -- USE_CUDSS is set to 0. Compiling without cuDSS support 2025-09-07T06:54:25.0863622Z #34 10.20 -- USE_CUFILE is set to 0. Compiling without cuFile support 2025-09-07T06:54:25.0864612Z #34 10.20 CMake Warning at /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:323 (message): 2025-09-07T06:54:25.0865665Z #34 10.20 pytorch is not compatible with `CMAKE_CUDA_ARCHITECTURES` and will ignore 2025-09-07T06:54:25.0866278Z #34 10.20 its value. Please configure `TORCH_CUDA_ARCH_LIST` instead. 2025-09-07T06:54:25.0866816Z #34 10.20 Call Stack (most recent call first): 2025-09-07T06:54:25.0867637Z #34 10.20 /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include) 2025-09-07T06:54:25.0868699Z #34 10.20 /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) 2025-09-07T06:54:25.0869422Z #34 10.20 CMakeLists.txt:80 (find_package) 2025-09-07T06:54:25.0870025Z #34 10.20 2025-09-07T06:54:25.0870230Z #34 10.20 2025-09-07T06:54:25.0871160Z #34 10.20 -- Added CUDA NVCC flags for: -gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_100,code=sm_100;-gencode;arch=compute_120,code=sm_120 2025-09-07T06:54:25.0872608Z #34 10.22 CMake Warning at /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message): 2025-09-07T06:54:25.0873443Z #34 10.22 static library kineto_LIBRARY-NOTFOUND not found. 2025-09-07T06:54:25.0873885Z #34 10.22 Call Stack (most recent call first): 2025-09-07T06:54:25.0874695Z #34 10.22 /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:125 (append_torchlib_if_found) 2025-09-07T06:54:25.0875523Z #34 10.22 CMakeLists.txt:80 (find_package) 2025-09-07T06:54:25.0875860Z #34 10.22 2025-09-07T06:54:25.0876124Z #34 10.22 2025-09-07T06:54:25.0876606Z #34 10.22 -- Found Torch: /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib/libtorch.so 2025-09-07T06:54:25.0877235Z #34 10.22 CMake Warning at CMakeLists.txt:112 (message): 2025-09-07T06:54:25.0877764Z #34 10.22 Pytorch version 2.8.0 expected for CUDA build, saw 2.9.0 instead. 2025-09-07T06:54:25.0878201Z #34 10.22 2025-09-07T06:54:25.0878423Z #34 10.22 2025-09-07T06:54:25.0878707Z #34 10.22 -- CUDA target architectures: 8.0;8.9;9.0;10.0;12.0 2025-09-07T06:54:25.0879208Z #34 10.22 -- CUDA supported target architectures: 8.0;8.9;9.0;10.0;12.0 2025-09-07T06:54:26.8418350Z #34 12.13 -- FetchContent base directory: /workspace/.deps 2025-09-07T06:54:26.9927619Z #34 12.13 -- Enabling cumem allocator extension. 2025-09-07T06:54:31.0064602Z #34 16.29 -- CMake Version: 4.1.0 2025-09-07T06:54:31.2310901Z #34 16.29 -- CUTLASS 4.0.0 2025-09-07T06:54:31.2311798Z #34 16.30 -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.8.93") 2025-09-07T06:54:31.2312483Z #34 16.36 -- CUDART: /usr/local/cuda/lib64/libcudart.so 2025-09-07T06:54:31.2312933Z #34 16.36 -- CUDA Driver: /usr/local/cuda/lib64/stubs/libcuda.so 2025-09-07T06:54:31.2313390Z #34 16.36 -- NVRTC: /usr/local/cuda/lib64/libnvrtc.so 2025-09-07T06:54:31.2313797Z #34 16.37 -- Default Install Location: install 2025-09-07T06:54:31.2419138Z #34 16.53 -- Found Python3: /opt/python/cp312-cp312/bin/python3.12 (found suitable version "3.12.11", minimum required is "3.5") found components: Interpreter 2025-09-07T06:54:31.3742026Z #34 16.66 -- CUDA Compilation Architectures: 70;72;75;80;86;87;89;90;90a;100;100a;120;120a;101;101a 2025-09-07T06:54:31.5902395Z #34 16.66 -- Enable caching of reference results in conv unit tests 2025-09-07T06:54:31.5903001Z #34 16.66 -- Enable rigorous conv problem sizes in conv unit tests 2025-09-07T06:54:31.5903809Z #34 16.66 -- Grid Dependency Control (GDC) is enabled for SM100 kernels (required for programmatic dependent launches). 2025-09-07T06:54:31.5904526Z #34 16.66 -- Using the following NVCC flags: 2025-09-07T06:54:31.5905015Z #34 16.66 --expt-relaxed-constexpr 2025-09-07T06:54:31.5905377Z #34 16.66 -ftemplate-backtrace-limit=0 2025-09-07T06:54:31.5905726Z #34 16.66 -DCUTLASS_TEST_LEVEL=0 2025-09-07T06:54:31.5906294Z #34 16.66 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 2025-09-07T06:54:31.5906730Z #34 16.66 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 2025-09-07T06:54:31.5907157Z #34 16.66 -DCUTLASS_DEBUG_TRACE_LEVEL=0 2025-09-07T06:54:31.5907504Z #34 16.66 -Xcompiler=-Wconversion 2025-09-07T06:54:31.5907937Z #34 16.66 -Xcompiler=-fno-strict-aliasing 2025-09-07T06:54:31.5908402Z #34 16.71 -- Configuring cublas ... 2025-09-07T06:54:31.5908827Z #34 16.71 -- cuBLAS Disabled. 2025-09-07T06:54:31.5909365Z #34 16.71 -- Configuring cuBLAS ... done. 2025-09-07T06:54:31.5909945Z #34 16.73 -- Marlin generation script hash: abd33f08f337455f84516269e0f85ed7 2025-09-07T06:54:31.5910538Z #34 16.73 -- Last run Marlin generate script hash: 2025-09-07T06:54:32.4862638Z #34 17.77 -- Marlin generation completed successfully. 2025-09-07T06:54:32.6419340Z #34 17.78 -- Building Marlin kernels for archs: 8.0;8.7;9.0+PTX 2025-09-07T06:54:32.6420098Z #34 17.78 -- Building AllSpark kernels for archs: 8.0;8.9 2025-09-07T06:54:32.6420734Z #34 17.78 -- Building scaled_mm_c3x_sm90 for archs: 9.0a 2025-09-07T06:54:32.6421251Z #34 17.78 -- Building scaled_mm_c3x_sm120 for archs: 12.0a 2025-09-07T06:54:32.6421902Z #34 17.78 -- Building scaled_mm_c3x_sm100 for archs: 10.0a 2025-09-07T06:54:32.6422462Z #34 17.78 -- Building scaled_mm_c2x for archs: 8.0;8.9+PTX 2025-09-07T06:54:32.6430016Z #34 17.78 -- Building sparse_scaled_mm_c3x for archs: 9.0a 2025-09-07T06:54:32.6430476Z #34 17.78 -- Building NVFP4 for archs: 12.0a 2025-09-07T06:54:32.6430848Z #34 17.78 -- Building NVFP4 for archs: 10.0a 2025-09-07T06:54:32.6431244Z #34 17.78 -- Building CUTLASS MLA for archs: 10.0a 2025-09-07T06:54:32.6431668Z #34 17.78 -- Building grouped_mm_c3x for archs: 9.0a 2025-09-07T06:54:32.6432273Z #34 17.78 -- Building grouped_mm_c3x for archs: 10.0a 2025-09-07T06:54:32.6432701Z #34 17.78 -- Building moe_data for archs: 9.0a;10.0a 2025-09-07T06:54:32.6433170Z #34 17.78 -- Building blockwise_scaled_group_mm_sm100 for archs: 10.0a 2025-09-07T06:54:32.6433766Z #34 17.78 -- Machete generation script hash: 54d14089cd629a0eee221067f44a0b46 2025-09-07T06:54:32.6434278Z #34 17.78 -- Last run machete generate script hash: 2025-09-07T06:54:32.7888519Z #34 18.08 -- Machete generation completed successfully. 2025-09-07T06:54:32.9419347Z #34 18.08 -- Building Machete kernels for archs: 9.0a 2025-09-07T06:54:32.9419859Z #34 18.08 -- Building W4A8 kernels for archs: 9.0a 2025-09-07T06:54:32.9420273Z #34 18.08 -- Enabling C extension. 2025-09-07T06:54:32.9420747Z #34 18.08 -- Marlin MOE generation script hash: e42dc1ed5a7c83988cc21a1bf57c6b6d 2025-09-07T06:54:32.9421312Z #34 18.08 -- Last run Marlin MOE generate script hash: 2025-09-07T06:54:33.4375546Z #34 18.73 -- Marlin MOE generation completed successfully. 2025-09-07T06:54:33.5965507Z #34 18.73 -- Building Marlin MOE kernels for archs: 8.0;8.7;9.0+PTX 2025-09-07T06:54:33.5966746Z #34 18.73 -- Enabling moe extension. 2025-09-07T06:54:33.5968936Z #34 18.73 CMake Warning (dev) at /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/cmake/data/share/cmake-4.1/Modules/FetchContent.cmake:1564 (cmake_parse_arguments): 2025-09-07T06:54:33.5970123Z #34 18.73 The BUILD_COMMAND keyword was followed by an empty string or no value at 2025-09-07T06:54:33.5970715Z #34 18.73 all. Policy CMP0174 is not set, so cmake_parse_arguments() will unset the 2025-09-07T06:54:33.5971316Z #34 18.73 ARG_BUILD_COMMAND variable rather than setting it to an empty string. 2025-09-07T06:54:33.5971800Z #34 18.73 Call Stack (most recent call first): 2025-09-07T06:54:33.5972973Z #34 18.73 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/cmake/data/share/cmake-4.1/Modules/FetchContent.cmake:2145:EVAL:2 (__FetchContent_doPopulation) 2025-09-07T06:54:33.5974585Z #34 18.73 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/cmake/data/share/cmake-4.1/Modules/FetchContent.cmake:2145 (cmake_language) 2025-09-07T06:54:33.5975984Z #34 18.73 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/cmake/data/share/cmake-4.1/Modules/FetchContent.cmake:2384 (__FetchContent_Populate) 2025-09-07T06:54:33.5977240Z #34 18.73 cmake/external_projects/flashmla.cmake:30 (FetchContent_MakeAvailable) 2025-09-07T06:54:33.5977800Z #34 18.73 CMakeLists.txt:942 (include) 2025-09-07T06:54:33.5978297Z #34 18.73 This warning is for project developers. Use -Wno-dev to suppress it. 2025-09-07T06:54:33.5978861Z #34 18.73 2025-09-07T06:54:33.5979810Z #34 18.73 CMake Warning (dev) at /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/cmake/data/share/cmake-4.1/Modules/FetchContent.cmake:1564 (cmake_parse_arguments): 2025-09-07T06:54:33.5980916Z #34 18.73 The CONFIGURE_COMMAND keyword was followed by an empty string or no value 2025-09-07T06:54:33.5981587Z #34 18.73 at all. Policy CMP0174 is not set, so cmake_parse_arguments() will unset 2025-09-07T06:54:33.5982210Z #34 18.73 the ARG_CONFIGURE_COMMAND variable rather than setting it to an empty 2025-09-07T06:54:33.5982789Z #34 18.73 string. 2025-09-07T06:54:33.5983064Z #34 18.73 Call Stack (most recent call first): 2025-09-07T06:54:33.5984103Z #34 18.73 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/cmake/data/share/cmake-4.1/Modules/FetchContent.cmake:2145:EVAL:2 (__FetchContent_doPopulation) 2025-09-07T06:54:33.5985375Z #34 18.73 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/cmake/data/share/cmake-4.1/Modules/FetchContent.cmake:2145 (cmake_language) 2025-09-07T06:54:33.5986579Z #34 18.73 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/cmake/data/share/cmake-4.1/Modules/FetchContent.cmake:2384 (__FetchContent_Populate) 2025-09-07T06:54:33.5987527Z #34 18.73 cmake/external_projects/flashmla.cmake:30 (FetchContent_MakeAvailable) 2025-09-07T06:54:33.5988062Z #34 18.73 CMakeLists.txt:942 (include) 2025-09-07T06:54:33.5988695Z #34 18.73 This warning is for project developers. Use -Wno-dev to suppress it. 2025-09-07T06:54:33.5989155Z #34 18.73 2025-09-07T06:54:36.8120650Z #34 22.10 -- FlashMLA is available at /workspace/.deps/flashmla-src 2025-09-07T06:54:44.3217768Z #34 29.61 -- Build type: Release 2025-09-07T06:54:44.3218187Z #34 29.61 -- Target device: cuda 2025-09-07T06:54:44.5185683Z #34 29.66 -- Found Python: /opt/python/cp312-cp312/bin/python3 (found version "3.12.11") found components: Interpreter Development.Module Development.SABIModule 2025-09-07T06:54:46.2244473Z #34 31.51 -- CUDA target architectures: 8.0;8.9;9.0;10.0;12.0 2025-09-07T06:54:46.3751128Z #34 31.51 CMake Warning at .deps/vllm-flash-attn-src/CMakeLists.txt:75 (message): 2025-09-07T06:54:46.3751815Z #34 31.51 Pytorch version 2.4.0 expected for CUDA build, saw 2.9.0 instead. 2025-09-07T06:54:46.3752260Z #34 31.51 2025-09-07T06:54:46.3752489Z #34 31.51 2025-09-07T06:54:46.3753063Z #34 31.51 -- CUDA supported target architectures: 8.0;8.9;9.0;10.0;12.0 2025-09-07T06:54:48.1473037Z #34 33.43 -- FA2_ARCHS: 8.0+PTX 2025-09-07T06:54:48.2614484Z #34 33.44 -- FA3_ARCHS: 9.0a;8.0 2025-09-07T06:54:48.2615069Z #34 33.45 -- vllm-flash-attn is available at /workspace/.deps/vllm-flash-attn-src 2025-09-07T06:54:48.2615625Z #34 33.45 -- Configuring done (28.2s) 2025-09-07T06:54:48.2615982Z #34 33.53 -- Generating done (0.1s) 2025-09-07T06:54:48.2616509Z #34 33.53 -- Build files have been written to: /workspace/build/temp.linux-x86_64-cpython-312 2025-09-07T06:54:48.2617118Z #34 33.55 Using MAX_JOBS=42 as the number of jobs. 2025-09-07T06:54:48.4141036Z #34 33.55 Using NVCC_THREADS=4 as the number of nvcc threads. 2025-09-07T06:54:48.8153074Z #34 34.10 [1/510] Building CXX object CMakeFiles/cumem_allocator.dir/csrc/cumem_allocator.cpp.o 2025-09-07T06:56:12.7044104Z #34 118.0 [2/510] Building CUDA object CMakeFiles/_C.dir/csrc/attention/merge_attn_states.cu.o 2025-09-07T06:56:14.2379296Z #34 119.5 [3/510] Building CUDA object CMakeFiles/_C.dir/csrc/attention/vertical_slash_index.cu.o 2025-09-07T06:56:15.6292480Z #34 120.9 [4/510] Building CUDA object CMakeFiles/_C.dir/csrc/pos_encoding_kernels.cu.o 2025-09-07T06:56:18.9616675Z #34 124.2 [5/510] Building CUDA object CMakeFiles/_C.dir/csrc/layernorm_quant_kernels.cu.o 2025-09-07T06:56:21.8913566Z #34 127.2 [6/510] Building CUDA object CMakeFiles/_C.dir/csrc/activation_kernels.cu.o 2025-09-07T06:56:24.1871720Z #34 129.5 [7/510] Building CUDA object CMakeFiles/_C.dir/csrc/cache_kernels.cu.o 2025-09-07T06:56:24.5397498Z #34 129.8 [8/510] Building CUDA object CMakeFiles/_C.dir/csrc/layernorm_kernels.cu.o 2025-09-07T06:57:34.4748301Z #34 199.8 [9/510] Building CUDA object CMakeFiles/_C.dir/csrc/cuda_view.cu.o 2025-09-07T06:57:47.4547261Z #34 212.7 [10/510] Building CUDA object CMakeFiles/_C.dir/csrc/sampler.cu.o 2025-09-07T06:57:48.5336151Z #34 213.8 [11/510] Building CUDA object CMakeFiles/_C.dir/csrc/cuda_utils_kernels.cu.o 2025-09-07T06:57:50.0766216Z #34 215.4 [12/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/compressed_tensors/int8_quant_kernels.cu.o 2025-09-07T06:57:56.4622205Z #34 221.7 [13/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp8/common.cu.o 2025-09-07T06:58:00.5339313Z #34 225.8 [14/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fused_kernels/fused_layernorm_dynamic_per_token_quant.cu.o 2025-09-07T06:58:03.8347869Z #34 229.1 [15/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq/q_gemm.cu.o 2025-09-07T06:58:08.6610588Z #34 233.9 [16/510] Building CUDA object CMakeFiles/_C.dir/csrc/mamba/mamba_ssm/selective_scan_fwd.cu.o 2025-09-07T06:58:17.1214319Z #34 242.4 [17/510] Building CXX object CMakeFiles/_C.dir/csrc/torch_bindings.cpp.o 2025-09-07T06:58:36.3186403Z #34 261.6 [18/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gguf/gguf_kernel.cu.o 2025-09-07T06:58:59.5637115Z #34 284.9 [19/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/activation_kernels.cu.o 2025-09-07T06:59:12.9541576Z #34 298.2 [20/510] Building CUDA object CMakeFiles/_C.dir/csrc/custom_all_reduce.cu.o 2025-09-07T06:59:13.1401576Z #34 298.4 [21/510] Building CXX object CMakeFiles/_C.dir/csrc/cutlass_extensions/common.cpp.o 2025-09-07T06:59:16.5943177Z #34 301.9 [22/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/awq/gemm_kernels.cu.o 2025-09-07T06:59:22.1083727Z #34 307.4 [23/510] Building CUDA object CMakeFiles/_C.dir/csrc/permute_cols.cu.o 2025-09-07T06:59:25.3812161Z #34 310.7 [24/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_entry.cu.o 2025-09-07T06:59:25.5849925Z #34 310.9 [25/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/nvfp4_quant_entry.cu.o 2025-09-07T06:59:32.4849947Z #34 317.8 [26/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/nvfp4_scaled_mm_entry.cu.o 2025-09-07T07:00:20.5927540Z #34 365.9 [27/510] Building CUDA object CMakeFiles/_C.dir/csrc/sparse/cutlass/sparse_scaled_mm_entry.cu.o 2025-09-07T07:00:28.7152218Z #34 374.0 [28/510] Building CUDA object CMakeFiles/_C.dir/csrc/attention/mla/cutlass_mla_entry.cu.o 2025-09-07T07:00:48.1716283Z #34 393.5 [29/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp8/per_token_group_quant.cu.o 2025-09-07T07:00:57.2412050Z #34 402.5 [30/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_bf16_kfe4m3fn.cu.o 2025-09-07T07:01:05.3231895Z #34 410.6 [31/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_bf16_kfe2m1f.cu.o 2025-09-07T07:01:38.3422361Z #34 443.6 [32/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_bf16_ku4.cu.o 2025-09-07T07:01:42.1401350Z #34 447.4 [33/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/nvfp4_blockwise_moe_kernel.cu.o 2025-09-07T07:01:43.6160245Z #34 448.9 [34/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_fp16_kfe2m1f.cu.o 2025-09-07T07:02:04.5140452Z #34 469.8 [35/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_bf16_ku4b8.cu.o 2025-09-07T07:02:17.9399758Z #34 483.2 [36/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_fp16_kfe4m3fn.cu.o 2025-09-07T07:02:51.2508807Z #34 516.5 [37/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/gptq_marlin.cu.o 2025-09-07T07:02:56.2624189Z #34 521.6 [38/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_bf16_ku8b128.cu.o 2025-09-07T07:03:07.8914263Z #34 533.2 [39/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/gptq_marlin_repack.cu.o 2025-09-07T07:03:14.7207247Z #34 540.0 [40/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/marlin/sparse/marlin_24_cuda_kernel.cu.o 2025-09-07T07:03:19.8872031Z #34 545.2 [41/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/awq_marlin_repack.cu.o 2025-09-07T07:03:20.2315069Z #34 545.5 [42/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_fp16_ku4.cu.o 2025-09-07T07:03:29.5658004Z #34 554.9 [43/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_fp16_ku4b8.cu.o 2025-09-07T07:03:48.4475904Z #34 573.7 [44/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_allspark/allspark_repack.cu.o 2025-09-07T07:03:52.1017732Z #34 577.4 [45/510] Building CUDA object CMakeFiles/_C.dir/csrc/attention/paged_attention_v1.cu.o 2025-09-07T07:03:58.1814346Z #34 583.5 [46/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c3x_sm90.cu.o 2025-09-07T07:04:02.5880163Z #34 587.9 [47/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_allspark/allspark_qgemm_w8a16.cu.o 2025-09-07T07:04:02.7162356Z #34 588.0 [48/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/gptq_marlin/kernel_fp16_ku8b128.cu.o 2025-09-07T07:04:11.7879476Z #34 597.1 [49/510] Building CUDA object CMakeFiles/_C.dir/csrc/attention/paged_attention_v2.cu.o 2025-09-07T07:04:37.2209706Z #34 622.5 [50/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c3x_sm120.cu.o 2025-09-07T07:04:52.0664319Z #34 637.4 [51/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c3x_sm100.cu.o 2025-09-07T07:05:12.7836595Z #34 658.1 [52/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/c3x/scaled_mm_blockwise_sm90_fp8.cu.o 2025-09-07T07:05:41.3496466Z #34 686.6 [53/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/c3x/scaled_mm_sm120_fp8.cu.o 2025-09-07T07:05:43.5515477Z #34 688.8 [54/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/c3x/scaled_mm_blockwise_sm120_fp8.cu.o 2025-09-07T07:06:09.4184437Z #34 714.7 [55/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/nvfp4_quant_kernels.cu.o 2025-09-07T07:06:09.4185990Z #34 714.7 ptxas warning : Value of threads per SM for entry _ZN4vllm15cvt_fp16_to_fp4I13__nv_bfloat16Lb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-09-07T07:06:09.4187469Z #34 714.7 ptxas warning : Value of threads per SM for entry _ZN4vllm15cvt_fp16_to_fp4I13__nv_bfloat16Lb1EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-09-07T07:06:09.4188868Z #34 714.7 ptxas warning : Value of threads per SM for entry _ZN4vllm15cvt_fp16_to_fp4I6__halfLb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-09-07T07:06:09.4190202Z #34 714.7 ptxas warning : Value of threads per SM for entry _ZN4vllm15cvt_fp16_to_fp4I6__halfLb1EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-09-07T07:06:10.7018770Z #34 716.0 [56/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/c3x/scaled_mm_sm90_int8.cu.o 2025-09-07T07:06:22.9422279Z #34 728.2 [57/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/c3x/scaled_mm_sm90_fp8.cu.o 2025-09-07T07:06:33.1726185Z #34 738.5 [58/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/c3x/scaled_mm_azp_sm90_int8.cu.o 2025-09-07T07:06:36.5835884Z #34 741.9 [59/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/c3x/scaled_mm_sm100_fp8.cu.o 2025-09-07T07:06:37.2720934Z #34 742.6 [60/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/activation_nvfp4_quant_fusion_kernels.cu.o 2025-09-07T07:06:37.2723714Z #34 742.6 ptxas warning : Value of threads per SM for entry _ZN4vllm24silu_and_cvt_fp16_to_fp4I13__nv_bfloat16Lb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-09-07T07:06:37.2726511Z #34 742.6 ptxas warning : Value of threads per SM for entry _ZN4vllm24silu_and_cvt_fp16_to_fp4I6__halfLb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-09-07T07:06:37.2729492Z #34 742.6 ptxas warning : Value of threads per SM for entry _ZN4vllm24silu_and_cvt_fp16_to_fp4I13__nv_bfloat16Lb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-09-07T07:06:37.2732307Z #34 742.6 ptxas warning : Value of threads per SM for entry _ZN4vllm24silu_and_cvt_fp16_to_fp4I6__halfLb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-09-07T07:07:03.9328323Z #34 769.2 [61/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/nvfp4_experts_quant.cu.o 2025-09-07T07:07:03.9329586Z #34 769.2 ptxas warning : Value of threads per SM for entry _ZN4vllm15cvt_fp16_to_fp4I13__nv_bfloat16Lb0ELb1EEEviiPKT_PKfPjS7_S7_S7_i is out of range. .minnctapersm will be ignored 2025-09-07T07:07:03.9331043Z #34 769.2 ptxas warning : Value of threads per SM for entry _ZN4vllm15cvt_fp16_to_fp4I13__nv_bfloat16Lb0ELb0EEEviiPKT_PKfPjS7_S7_S7_i is out of range. .minnctapersm will be ignored 2025-09-07T07:07:03.9332621Z #34 769.2 ptxas warning : Value of threads per SM for entry _ZN4vllm15cvt_fp16_to_fp4I6__halfLb0ELb1EEEviiPKT_PKfPjS7_S7_S7_i is out of range. .minnctapersm will be ignored 2025-09-07T07:07:03.9334486Z #34 769.2 ptxas warning : Value of threads per SM for entry _ZN4vllm15cvt_fp16_to_fp4I6__halfLb0ELb0EEEviiPKT_PKfPjS7_S7_S7_i is out of range. .minnctapersm will be ignored 2025-09-07T07:07:05.8544574Z #34 771.1 [62/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/c3x/scaled_mm_blockwise_sm100_fp8.cu.o 2025-09-07T07:07:59.8701723Z #34 825.2 [63/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/moe/moe_data.cu.o 2025-09-07T07:08:06.1168848Z #34 831.4 [64/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/nvfp4_scaled_mm_sm120_kernels.cu.o 2025-09-07T07:08:34.4253530Z #34 859.7 [65/510] Building CUDA object CMakeFiles/_C.dir/csrc/attention/mla/cutlass_mla_kernels.cu.o 2025-09-07T07:08:39.7244781Z #34 865.0 [66/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/fp4/nvfp4_scaled_mm_kernels.cu.o 2025-09-07T07:08:47.9443807Z #34 873.2 [67/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/moe/blockwise_scaled_group_mm_sm100.cu.o 2025-09-07T07:09:07.9721703Z #34 893.3 [68/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_dispatch.cu.o 2025-09-07T07:09:09.5814519Z #34 894.9 [69/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/moe/grouped_mm_c3x_sm100.cu.o 2025-09-07T07:09:28.5481732Z #34 913.8 [70/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/moe/grouped_mm_c3x_sm90.cu.o 2025-09-07T07:09:46.3357910Z #34 931.6 [71/510] Building CUDA object CMakeFiles/_C.dir/csrc/attention/mla/sm100_cutlass_mla_kernel.cu.o 2025-09-07T07:10:59.2538085Z #34 1004.5 [72/510] Building CUDA object CMakeFiles/_C.dir/csrc/sparse/cutlass/sparse_scaled_mm_c3x.cu.o 2025-09-07T07:11:17.3849638Z #34 1022.7 [73/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_impl_part1.cu.o 2025-09-07T07:11:45.7413991Z #34 1051.0 [74/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_impl_part2.cu.o 2025-09-07T07:11:51.4094637Z #34 1056.7 [75/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_impl_part3.cu.o 2025-09-07T07:11:59.3219839Z #34 1064.6 [76/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_impl_part4.cu.o 2025-09-07T07:12:10.4781690Z #34 1075.8 [77/510] Building CXX object CMakeFiles/_moe_C.dir/csrc/moe/torch_bindings.cpp.o 2025-09-07T07:12:16.3961181Z #34 1081.7 [78/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_prepack.cu.o 2025-09-07T07:12:24.8609801Z #34 1090.1 [79/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/machete_pytorch.cu.o 2025-09-07T07:12:24.8611333Z #34 1090.1 nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-09-07T07:12:25.2399564Z #34 1090.5 [80/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_impl_part6.cu.o 2025-09-07T07:12:29.9036474Z #34 1095.2 [81/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_impl_part5.cu.o 2025-09-07T07:12:43.6185982Z #34 1108.9 [82/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_impl_part7.cu.o 2025-09-07T07:13:01.9328613Z #34 1127.2 [83/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/machete/generated/machete_mm_impl_part8.cu.o 2025-09-07T07:13:27.3857464Z #34 1152.7 [84/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/moe_align_sum_kernels.cu.o 2025-09-07T07:13:35.4436569Z #34 1160.7 [85/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/moe_wna16.cu.o 2025-09-07T07:13:40.3224176Z #34 1165.6 [86/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o 2025-09-07T07:13:46.8852262Z #34 1172.2 [87/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/grouped_topk_kernels.cu.o 2025-09-07T07:13:56.2554208Z #34 1181.5 [88/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/permute_unpermute_kernels/moe_permute_unpermute_kernel.cu.o 2025-09-07T07:14:00.2385244Z #34 1185.5 [89/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/moe_permute_unpermute_op.cu.o 2025-09-07T07:14:20.0161743Z #34 1205.3 [90/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_bf16_kfe2m1f.cu.o 2025-09-07T07:14:35.6765091Z #34 1221.0 [91/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_bf16_kfe4m3fn.cu.o 2025-09-07T07:15:04.7499913Z #34 1250.0 [92/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_fp16_kfe2m1f.cu.o 2025-09-07T07:15:34.0230626Z #34 1279.3 [93/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_fp16_kfe4m3fn.cu.o 2025-09-07T07:15:36.1993097Z #34 1281.5 [94/510] Building CXX object CMakeFiles/_flashmla_C.dir/.deps/flashmla-src/csrc/flash_api.cpp.o 2025-09-07T07:15:37.3615541Z #34 1282.6 [95/510] Building CUDA object CMakeFiles/_flashmla_C.dir/.deps/flashmla-src/csrc/kernels/get_mla_metadata.cu.o 2025-09-07T07:15:39.1036413Z #34 1284.4 [96/510] Building CUDA object CMakeFiles/_flashmla_C.dir/.deps/flashmla-src/csrc/kernels/mla_combine.cu.o 2025-09-07T07:15:40.8961378Z #34 1286.2 [97/510] Building CUDA object CMakeFiles/_flashmla_C.dir/.deps/flashmla-src/csrc/kernels/splitkv_mla.cu.o 2025-09-07T07:15:42.7251875Z #34 1288.0 [98/510] Building CUDA object CMakeFiles/_flashmla_C.dir/.deps/flashmla-src/csrc/kernels_fp8/flash_fwd_mla_fp8_sm90.cu.o 2025-09-07T07:15:44.7063225Z #34 1290.0 [99/510] Building CXX object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/flash_api.cpp.o 2025-09-07T07:15:46.6257572Z #34 1291.9 [100/510] Building CXX object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/flash_api_sparse.cpp.o 2025-09-07T07:15:46.8465251Z #34 1292.1 [101/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_bf16_ku4.cu.o 2025-09-07T07:15:48.5175954Z #34 1293.8 [102/510] Building CXX object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/flash_api_torch_lib.cpp.o 2025-09-07T07:15:49.4543627Z #34 1294.7 [103/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim128_bf16_causal_sm80.cu.o 2025-09-07T07:15:51.0854804Z #34 1296.4 [104/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim128_bf16_sm80.cu.o 2025-09-07T07:15:52.0615056Z #34 1297.3 [105/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim128_fp16_causal_sm80.cu.o 2025-09-07T07:15:53.5945577Z #34 1298.9 [106/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w4a8/w4a8_mm_entry.cu.o 2025-09-07T07:15:53.8129059Z #34 1298.9 [107/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim128_fp16_sm80.cu.o 2025-09-07T07:15:54.6084093Z #34 1299.9 [108/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim160_bf16_causal_sm80.cu.o 2025-09-07T07:15:56.1583001Z #34 1301.4 [109/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim160_bf16_sm80.cu.o 2025-09-07T07:15:56.3922213Z #34 1301.5 [110/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim160_fp16_causal_sm80.cu.o 2025-09-07T07:15:57.1485139Z #34 1302.4 [111/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim160_fp16_sm80.cu.o 2025-09-07T07:15:58.7046380Z #34 1304.0 [112/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim192_bf16_causal_sm80.cu.o 2025-09-07T07:15:58.8134844Z #34 1304.1 [113/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim192_bf16_sm80.cu.o 2025-09-07T07:15:59.6790874Z #34 1305.0 [114/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim192_fp16_causal_sm80.cu.o 2025-09-07T07:16:01.1985142Z #34 1306.5 [115/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim192_fp16_sm80.cu.o 2025-09-07T07:16:01.3293051Z #34 1306.6 [116/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim256_bf16_causal_sm80.cu.o 2025-09-07T07:16:02.1640938Z #34 1307.5 [117/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim256_bf16_sm80.cu.o 2025-09-07T07:16:03.6760398Z #34 1309.0 [118/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim256_fp16_causal_sm80.cu.o 2025-09-07T07:16:03.8314741Z #34 1309.1 [119/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim256_fp16_sm80.cu.o 2025-09-07T07:16:04.6272184Z #34 1309.9 [120/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim32_bf16_causal_sm80.cu.o 2025-09-07T07:16:06.1180760Z #34 1311.4 [121/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim32_bf16_sm80.cu.o 2025-09-07T07:16:06.2898025Z #34 1311.6 [122/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim32_fp16_causal_sm80.cu.o 2025-09-07T07:16:07.0776265Z #34 1312.4 [123/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim32_fp16_sm80.cu.o 2025-09-07T07:16:08.5910842Z #34 1313.9 [124/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim64_bf16_causal_sm80.cu.o 2025-09-07T07:16:08.7453218Z #34 1314.0 [125/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim64_bf16_sm80.cu.o 2025-09-07T07:16:09.5905207Z #34 1314.9 [126/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim64_fp16_causal_sm80.cu.o 2025-09-07T07:16:11.0345018Z #34 1316.3 [127/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim64_fp16_sm80.cu.o 2025-09-07T07:16:11.1896796Z #34 1316.5 [128/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim96_bf16_causal_sm80.cu.o 2025-09-07T07:16:12.0227264Z #34 1317.3 [129/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim96_bf16_sm80.cu.o 2025-09-07T07:16:13.5010688Z #34 1318.8 [130/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim96_fp16_causal_sm80.cu.o 2025-09-07T07:16:13.6341876Z #34 1318.9 [131/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_hdim96_fp16_sm80.cu.o 2025-09-07T07:16:14.4555456Z #34 1319.7 [132/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_sparse_hdim128_bf16_causal_sm80.cu.o 2025-09-07T07:16:14.8391461Z #34 1320.1 [133/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/ops.cu.o 2025-09-07T07:16:15.9347573Z #34 1321.2 [134/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_sparse_hdim128_bf16_sm80.cu.o 2025-09-07T07:16:16.0606214Z #34 1321.3 [135/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_sparse_hdim128_fp16_causal_sm80.cu.o 2025-09-07T07:16:16.8725892Z #34 1322.2 [136/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_sparse_hdim128_fp16_sm80.cu.o 2025-09-07T07:16:17.2629798Z #34 1322.6 [137/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim128_bf16_causal_sm80.cu.o 2025-09-07T07:16:18.1810418Z #34 1323.5 [138/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_fp16_ku4.cu.o 2025-09-07T07:16:18.5009859Z #34 1323.8 [139/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim128_fp16_causal_sm80.cu.o 2025-09-07T07:16:18.6583409Z #34 1323.8 [140/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim128_bf16_sm80.cu.o 2025-09-07T07:16:19.3634656Z #34 1324.7 [141/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim128_fp16_sm80.cu.o 2025-09-07T07:16:19.5660095Z #34 1324.9 [142/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_bf16_ku4b8.cu.o 2025-09-07T07:16:19.7666454Z #34 1325.1 [143/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim160_bf16_causal_sm80.cu.o 2025-09-07T07:16:20.6480856Z #34 1325.9 [144/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim160_bf16_sm80.cu.o 2025-09-07T07:16:20.9778961Z #34 1326.3 [145/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim160_fp16_causal_sm80.cu.o 2025-09-07T07:16:21.1769050Z #34 1326.3 [146/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim160_fp16_sm80.cu.o 2025-09-07T07:16:21.9071812Z #34 1327.2 [147/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim192_bf16_causal_sm80.cu.o 2025-09-07T07:16:22.0327865Z #34 1327.3 [148/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim192_bf16_sm80.cu.o 2025-09-07T07:16:22.2300379Z #34 1327.5 [149/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim192_fp16_causal_sm80.cu.o 2025-09-07T07:16:23.1447920Z #34 1328.4 [150/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim192_fp16_sm80.cu.o 2025-09-07T07:16:23.5305293Z #34 1328.8 [151/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim256_bf16_causal_sm80.cu.o 2025-09-07T07:16:23.7163125Z #34 1328.9 [152/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim256_bf16_sm80.cu.o 2025-09-07T07:16:24.3796083Z #34 1329.7 [153/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim256_fp16_causal_sm80.cu.o 2025-09-07T07:16:24.5614544Z #34 1329.8 [154/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim256_fp16_sm80.cu.o 2025-09-07T07:16:24.7478107Z #34 1330.0 [155/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim32_bf16_causal_sm80.cu.o 2025-09-07T07:16:25.6737620Z #34 1331.0 [156/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim32_bf16_sm80.cu.o 2025-09-07T07:16:26.0402262Z #34 1331.3 [157/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim32_fp16_causal_sm80.cu.o 2025-09-07T07:16:26.2322100Z #34 1331.4 [158/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim32_fp16_sm80.cu.o 2025-09-07T07:16:26.9170570Z #34 1332.2 [159/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim64_bf16_causal_sm80.cu.o 2025-09-07T07:16:27.1586368Z #34 1332.4 [160/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim64_bf16_sm80.cu.o 2025-09-07T07:16:27.2617391Z #34 1332.5 [161/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim64_fp16_causal_sm80.cu.o 2025-09-07T07:16:27.3794450Z #34 1332.7 [162/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_bf16_ku8b128.cu.o 2025-09-07T07:16:28.2690959Z #34 1333.6 [163/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim64_fp16_sm80.cu.o 2025-09-07T07:16:28.5642323Z #34 1333.9 [164/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim96_bf16_causal_sm80.cu.o 2025-09-07T07:16:28.8039798Z #34 1333.9 [165/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim96_bf16_sm80.cu.o 2025-09-07T07:16:29.4136630Z #34 1334.7 [166/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim96_fp16_causal_sm80.cu.o 2025-09-07T07:16:29.6682235Z #34 1335.0 [167/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa2_C.dir/csrc/flash_attn/src/flash_fwd_split_hdim96_fp16_sm80.cu.o 2025-09-07T07:16:34.3139033Z #34 1339.6 [168/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/flash_prepare_scheduler.cu.o 2025-09-07T07:16:42.3116686Z #34 1347.6 [169/510] Building CXX object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/flash_api_torch_lib.cpp.o 2025-09-07T07:16:43.6161534Z #34 1348.9 [170/510] Building CXX object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/flash_api.cpp.o 2025-09-07T07:16:58.0674149Z #34 1363.4 [171/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_fp16_ku4b8.cu.o 2025-09-07T07:17:14.9397879Z #34 1380.2 [172/510] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/marlin_moe_wna16/kernel_fp16_ku8b128.cu.o 2025-09-07T07:17:59.3066914Z #34 1424.6 [173/510] Building CUDA object CMakeFiles/_C.dir/csrc/quantization/cutlass_w8a8/scaled_mm_c2x.cu.o 2025-09-07T07:18:31.0782500Z #34 1456.4 [174/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/flash_fwd_combine.cu.o 2025-09-07T07:20:12.7138035Z #34 1558.0 [175/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_sm90.cu.o 2025-09-07T07:20:43.3189906Z #34 1588.6 [176/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_softcap_sm90.cu.o 2025-09-07T07:21:41.9203358Z #34 1647.2 [177/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_softcap_sm90.cu.o 2025-09-07T07:21:42.4463713Z #34 1647.7 [178/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_sm90.cu.o 2025-09-07T07:21:42.6566262Z #34 1647.8 [179/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_sm90.cu.o 2025-09-07T07:21:51.1888822Z #34 1656.5 [180/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_softcap_sm90.cu.o 2025-09-07T07:22:03.9940726Z #34 1669.3 [181/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_packgqa_sm90.cu.o 2025-09-07T07:22:30.1776704Z #34 1695.5 [182/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_softcap_packgqa_sm90.cu.o 2025-09-07T07:23:40.9788262Z #34 1766.3 [183/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_split_sm90.cu.o 2025-09-07T07:23:54.3797898Z #34 1779.7 [184/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_packgqa_sm90.cu.o 2025-09-07T07:24:10.8310374Z #34 1796.1 [185/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_sm90.cu.o 2025-09-07T07:24:11.1968718Z #34 1796.5 [186/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_split_softcap_sm90.cu.o 2025-09-07T07:24:24.0230145Z #34 1809.3 [187/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_sm90.cu.o 2025-09-07T07:25:04.3010145Z #34 1849.6 [188/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_softcap_sm90.cu.o 2025-09-07T07:25:10.7771150Z #34 1856.1 [189/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_softcap_sm90.cu.o 2025-09-07T07:25:16.8686403Z #34 1862.2 [190/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_sm90.cu.o 2025-09-07T07:25:19.0653015Z #34 1864.4 [191/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_softcap_sm90.cu.o 2025-09-07T07:25:45.7802567Z #34 1891.1 [192/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_softcap_packgqa_sm90.cu.o 2025-09-07T07:27:32.9363841Z #34 1998.2 [193/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_split_sm90.cu.o 2025-09-07T07:27:43.0665123Z #34 2008.4 [194/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_split_softcap_sm90.cu.o 2025-09-07T07:27:51.1610352Z #34 2016.4 [195/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_sm90.cu.o 2025-09-07T07:27:55.1044317Z #34 2020.4 [196/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_packgqa_sm90.cu.o 2025-09-07T07:27:59.0770277Z #34 2024.4 [197/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_sm90.cu.o 2025-09-07T07:28:09.8885608Z #34 2035.2 [198/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_softcap_sm90.cu.o 2025-09-07T07:28:18.9008048Z #34 2044.2 [199/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_softcap_sm90.cu.o 2025-09-07T07:28:56.4982024Z #34 2081.8 [200/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_split_sm90.cu.o 2025-09-07T07:29:03.8054262Z #34 2089.1 [201/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_softcap_packgqa_sm90.cu.o 2025-09-07T07:29:04.6637592Z #34 2090.0 [202/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_split_softcap_sm90.cu.o 2025-09-07T07:31:21.4664134Z #34 2226.8 [203/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_split_sm90.cu.o 2025-09-07T07:31:33.3871764Z #34 2238.7 [204/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_split_softcap_sm90.cu.o 2025-09-07T07:32:15.5509146Z #34 2280.8 [205/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_sm90.cu.o 2025-09-07T07:32:30.2753088Z #34 2295.6 [206/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_softcap_sm90.cu.o 2025-09-07T07:33:00.0814270Z #34 2325.4 [207/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_packgqa_sm90.cu.o 2025-09-07T07:33:06.6282964Z #34 2331.9 [208/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_sm90.cu.o 2025-09-07T07:33:07.7516331Z #34 2333.0 [209/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_softcap_sm90.cu.o 2025-09-07T07:33:27.6040274Z #34 2352.9 [210/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_split_sm90.cu.o 2025-09-07T07:33:43.1675443Z #34 2368.5 [211/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_split_softcap_sm90.cu.o 2025-09-07T07:34:14.4205666Z #34 2399.7 [212/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_softcap_packgqa_sm90.cu.o 2025-09-07T07:35:46.6554818Z #34 2491.9 [213/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_packgqa_sm90.cu.o 2025-09-07T07:35:48.8538892Z #34 2494.1 [214/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_sm90.cu.o 2025-09-07T07:35:56.9635312Z #34 2502.3 [215/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_sm90.cu.o 2025-09-07T07:36:19.1596226Z #34 2524.4 [216/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_softcap_sm90.cu.o 2025-09-07T07:36:32.3844781Z #34 2537.7 [217/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_split_softcap_sm90.cu.o 2025-09-07T07:36:32.5693484Z #34 2537.7 [218/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_split_sm90.cu.o 2025-09-07T07:36:34.6353377Z #34 2539.9 [219/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_split_sm90.cu.o 2025-09-07T07:36:43.4792817Z #34 2548.8 [220/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_softcap_sm90.cu.o 2025-09-07T07:36:50.8504756Z #34 2556.1 [221/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_split_softcap_sm90.cu.o 2025-09-07T07:37:14.5983709Z #34 2579.9 [222/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_softcap_packgqa_sm90.cu.o 2025-09-07T07:38:40.1742650Z #34 2665.5 [223/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_packgqa_sm90.cu.o 2025-09-07T07:38:51.1944622Z #34 2676.5 [224/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_sm90.cu.o 2025-09-07T07:39:05.3994270Z #34 2690.7 [225/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_softcap_sm90.cu.o 2025-09-07T07:39:08.3194485Z #34 2693.6 [226/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_split_sm90.cu.o 2025-09-07T07:39:09.9044670Z #34 2695.2 [227/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_paged_split_softcap_sm90.cu.o 2025-09-07T07:39:13.1137918Z #34 2698.4 [228/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_sm90.cu.o 2025-09-07T07:39:23.3064133Z #34 2708.6 [229/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_split_sm90.cu.o 2025-09-07T07:39:27.5485644Z #34 2712.8 [230/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_split_softcap_sm90.cu.o 2025-09-07T07:39:36.9119688Z #34 2722.2 [231/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_softcap_packgqa_sm90.cu.o 2025-09-07T07:39:46.2762157Z #34 2731.6 [232/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_softcap_sm90.cu.o 2025-09-07T07:41:27.7984158Z #34 2833.1 [233/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_split_sm90.cu.o 2025-09-07T07:41:39.9279604Z #34 2845.2 [234/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_bf16_split_softcap_sm90.cu.o 2025-09-07T07:42:51.8490353Z #34 2917.1 [235/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_packgqa_sm90.cu.o 2025-09-07T07:42:51.8498211Z #34 2917.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T07:42:51.8509817Z #34 2917.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T07:42:51.8521521Z #34 2917.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T07:42:51.8533233Z #34 2917.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T07:42:54.2248424Z #34 2919.5 [236/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_sm90.cu.o 2025-09-07T07:42:54.2254958Z #34 2919.5 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T07:42:54.2266716Z #34 2919.5 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T07:42:54.2278112Z #34 2919.5 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T07:42:54.2289195Z #34 2919.5 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T07:43:01.8048958Z #34 2927.1 [237/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_paged_sm90.cu.o 2025-09-07T07:43:01.8055231Z #34 2927.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_13PipelineAsyncILi2EEES19_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S19_S19_S1C_S1E_S1G_iS1H_S1L_S1M_S1O_EUlRS1P_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1P_S1Q_S1R_' 2025-09-07T07:43:01.8068045Z #34 2927.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_13PipelineAsyncILi2EEES19_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S19_S19_S1C_S1E_S1G_iS1H_S1L_S1M_S1O_EUlRS1P_iE0_NS3_ILb1EEES1W_EEDaiS1P_S1Q_S1R_' 2025-09-07T07:43:01.8076815Z #34 2927.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1ELb0ELb1ELb0ELb0ESE_Li1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T07:43:01.8083429Z #34 2927.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb0ELb0ESE_Li1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T07:43:03.9913959Z #34 2929.3 [238/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_paged_softcap_sm90.cu.o 2025-09-07T07:43:03.9926635Z #34 2929.3 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_13PipelineAsyncILi2EEES19_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S19_S19_S1C_S1E_S1G_iS1H_S1L_S1M_S1O_EUlRS1P_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1P_S1Q_S1R_' 2025-09-07T07:43:03.9949837Z #34 2929.3 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_13PipelineAsyncILi2EEES19_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S19_S19_S1C_S1E_S1G_iS1H_S1L_S1M_S1O_EUlRS1P_iE0_NS3_ILb1EEES1W_EEDaiS1P_S1Q_S1R_' 2025-09-07T07:43:03.9968414Z #34 2929.3 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb0ELb1ELb1ELb0ELb1ELb0ELb0ESE_Li1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T07:43:03.9982824Z #34 2929.3 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb0ELb0ESE_Li1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T07:43:12.4860200Z #34 2937.8 [239/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_softcap_sm90.cu.o 2025-09-07T07:43:12.4866584Z #34 2937.8 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T07:43:12.4878290Z #34 2937.8 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T07:43:12.4889837Z #34 2937.8 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T07:43:12.4901795Z #34 2937.8 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T07:43:19.0810921Z #34 2944.4 [240/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_paged_split_sm90.cu.o 2025-09-07T07:43:19.0817165Z #34 2944.4 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_13PipelineAsyncILi2EEES19_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S19_S19_S1C_S1E_S1G_iS1H_S1L_S1M_S1O_EUlRS1P_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1P_S1Q_S1R_' 2025-09-07T07:43:19.0828372Z #34 2944.4 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_13PipelineAsyncILi2EEES19_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S19_S19_S1C_S1E_S1G_iS1H_S1L_S1M_S1O_EUlRS1P_iE0_NS3_ILb1EEES1W_EEDaiS1P_S1Q_S1R_' 2025-09-07T07:43:19.0837129Z #34 2944.4 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1ELb0ELb1ELb1ELb0ESE_Li1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T07:43:19.0843763Z #34 2944.4 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb1ELb0ESE_Li1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T07:43:24.8820359Z #34 2950.2 [241/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_softcap_packgqa_sm90.cu.o 2025-09-07T07:43:24.8826848Z #34 2950.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T07:43:24.8838639Z #34 2950.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T07:43:24.8849797Z #34 2950.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T07:43:24.8861445Z #34 2950.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T07:43:25.4471463Z #34 2950.7 [242/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_paged_split_softcap_sm90.cu.o 2025-09-07T07:43:25.4477863Z #34 2950.7 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_13PipelineAsyncILi2EEES19_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S19_S19_S1C_S1E_S1G_iS1H_S1L_S1M_S1O_EUlRS1P_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1P_S1Q_S1R_' 2025-09-07T07:43:25.4489439Z #34 2950.7 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_13PipelineAsyncILi2EEES19_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S19_S19_S1C_S1E_S1G_iS1H_S1L_S1M_S1O_EUlRS1P_iE0_NS3_ILb1EEES1W_EEDaiS1P_S1Q_S1R_' 2025-09-07T07:43:25.4499146Z #34 2950.7 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb0ELb1ELb1ELb0ELb1ELb1ELb0ESE_Li1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T07:43:25.4506297Z #34 2950.7 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_10bfloat16_tEfNS_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb1ELb0ESE_Li1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T07:45:19.3382878Z #34 3064.6 [243/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_split_sm90.cu.o 2025-09-07T07:45:19.3389325Z #34 3064.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T07:45:19.3401501Z #34 3064.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T07:45:19.3413992Z #34 3064.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T07:45:19.3425487Z #34 3064.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T07:45:35.4357291Z #34 3080.7 [244/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_bf16_split_softcap_sm90.cu.o 2025-09-07T07:45:35.4363760Z #34 3080.7 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T07:45:35.4375851Z #34 3080.7 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T07:45:35.4387282Z #34 3080.7 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1R_S1S_S1T_' 2025-09-07T07:45:35.4399016Z #34 3080.7 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass10bfloat16_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb0ESB_Li1EE3mmaINS_16FlashAttnFwdSm90ISE_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEEST_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_ST_NS3_ILi4EEEEEENS3_ILi0EEESZ_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSE_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1B_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSF_ISO_S12_S14_EbS17_S1B_S1B_S1E_S1G_S1I_iS1J_S1N_S1O_S1Q_EUlRS1R_iE0_NS3_ILb1EEES1Y_EEDaiS1R_S1S_S1T_' 2025-09-07T07:46:32.1992920Z #34 3137.5 [245/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_packgqa_sm90.cu.o 2025-09-07T07:46:36.8526195Z #34 3142.1 [246/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_sm90.cu.o 2025-09-07T07:46:41.1898855Z #34 3146.5 [247/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_paged_sm90.cu.o 2025-09-07T07:46:43.9688127Z #34 3149.3 [248/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_softcap_sm90.cu.o 2025-09-07T07:46:49.8529766Z #34 3155.1 [249/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_paged_softcap_sm90.cu.o 2025-09-07T07:46:59.4249661Z #34 3164.7 [250/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_paged_split_sm90.cu.o 2025-09-07T07:47:08.2451528Z #34 3173.5 [251/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_softcap_packgqa_sm90.cu.o 2025-09-07T07:47:08.3840101Z #34 3173.7 [252/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_paged_split_softcap_sm90.cu.o 2025-09-07T07:49:02.4021765Z #34 3287.7 [253/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_sm80.cu.o 2025-09-07T07:49:06.5155828Z #34 3291.8 [254/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_split_sm90.cu.o 2025-09-07T07:49:09.6530524Z #34 3294.9 [255/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_softcap_sm80.cu.o 2025-09-07T07:49:25.2086461Z #34 3310.5 [256/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_bf16_split_softcap_sm90.cu.o 2025-09-07T07:50:02.0274152Z #34 3347.3 [257/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_sm80.cu.o 2025-09-07T07:50:07.2523167Z #34 3352.5 [258/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_softcap_sm80.cu.o 2025-09-07T07:50:09.7546983Z #34 3355.0 [259/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_split_softcapall_sm80.cu.o 2025-09-07T07:50:32.0224142Z #34 3377.3 [260/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_softcap_sm80.cu.o 2025-09-07T07:51:17.4658414Z #34 3422.8 [261/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_split_sm80.cu.o 2025-09-07T07:51:22.9196613Z #34 3428.2 [262/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_split_softcap_sm80.cu.o 2025-09-07T07:51:41.7711094Z #34 3447.1 [263/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_paged_softcapall_sm80.cu.o 2025-09-07T07:51:56.4808766Z #34 3461.8 [264/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_softcapall_sm80.cu.o 2025-09-07T07:52:11.5232288Z #34 3476.8 [265/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_bf16_split_softcapall_sm80.cu.o 2025-09-07T07:52:16.3614021Z #34 3481.6 [266/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_sm80.cu.o 2025-09-07T07:52:53.9736790Z #34 3519.3 [267/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_softcap_sm80.cu.o 2025-09-07T07:53:17.2050225Z #34 3542.5 [268/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_sm80.cu.o 2025-09-07T07:53:40.8689238Z #34 3566.2 [269/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_softcap_sm80.cu.o 2025-09-07T07:54:20.6409167Z #34 3605.9 [270/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_softcap_sm80.cu.o 2025-09-07T07:54:31.7951148Z #34 3617.1 [271/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_softcapall_sm80.cu.o 2025-09-07T07:54:53.4457925Z #34 3638.7 [272/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_split_sm80.cu.o 2025-09-07T07:55:09.0472685Z #34 3654.3 [273/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_split_softcap_sm80.cu.o 2025-09-07T07:56:02.6843094Z #34 3708.0 [274/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_softcapall_sm80.cu.o 2025-09-07T07:56:07.8180151Z #34 3713.1 [275/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_sm80.cu.o 2025-09-07T07:56:07.9625501Z #34 3713.3 [276/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_paged_split_softcapall_sm80.cu.o 2025-09-07T07:56:30.1649089Z #34 3735.5 [277/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_softcap_sm80.cu.o 2025-09-07T07:56:46.1579279Z #34 3751.4 [278/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_bf16_split_softcapall_sm80.cu.o 2025-09-07T07:57:38.6719951Z #34 3804.0 [279/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_split_sm80.cu.o 2025-09-07T07:57:50.3400000Z #34 3815.6 [280/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_split_softcap_sm80.cu.o 2025-09-07T07:58:14.7544819Z #34 3840.0 [281/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_softcap_sm80.cu.o 2025-09-07T07:58:22.0733312Z #34 3847.4 [282/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_softcapall_sm80.cu.o 2025-09-07T07:58:56.3071328Z #34 3881.6 [283/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_sm80.cu.o 2025-09-07T07:59:17.9222643Z #34 3903.2 [284/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_split_sm80.cu.o 2025-09-07T07:59:19.1024694Z #34 3904.4 [285/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_split_softcap_sm80.cu.o 2025-09-07T07:59:53.0797775Z #34 3938.4 [286/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_softcap_sm80.cu.o 2025-09-07T07:59:53.3154758Z #34 3938.5 [287/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_paged_split_softcapall_sm80.cu.o 2025-09-07T08:00:27.1072004Z #34 3972.4 [288/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_split_sm80.cu.o 2025-09-07T08:00:29.9480496Z #34 3975.2 [289/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_softcapall_sm80.cu.o 2025-09-07T08:00:30.6604650Z #34 3975.9 [290/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_split_softcap_sm80.cu.o 2025-09-07T08:00:48.4645635Z #34 3993.8 [291/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_softcapall_sm80.cu.o 2025-09-07T08:01:14.1120990Z #34 4019.4 [292/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_bf16_split_softcapall_sm80.cu.o 2025-09-07T08:01:28.9379550Z #34 4034.2 [293/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_softcap_sm80.cu.o 2025-09-07T08:01:54.7141005Z #34 4060.0 [294/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_paged_split_softcapall_sm80.cu.o 2025-09-07T08:01:57.3169148Z #34 4062.6 [295/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_split_sm80.cu.o 2025-09-07T08:01:59.0212000Z #34 4064.3 [296/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_split_softcap_sm80.cu.o 2025-09-07T08:02:11.4482988Z #34 4076.7 [297/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_softcapall_sm80.cu.o 2025-09-07T08:02:47.1178276Z #34 4112.4 [298/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_sm80.cu.o 2025-09-07T08:02:47.8100260Z #34 4113.1 [299/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_softcap_sm80.cu.o 2025-09-07T08:03:16.8060544Z #34 4142.1 [300/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_bf16_split_softcapall_sm80.cu.o 2025-09-07T08:03:29.7438842Z #34 4155.0 [301/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_split_sm80.cu.o 2025-09-07T08:03:43.8606756Z #34 4169.1 [302/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_split_softcap_sm80.cu.o 2025-09-07T08:03:49.8155107Z #34 4175.1 [303/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_softcapall_sm80.cu.o 2025-09-07T08:04:10.1338720Z #34 4195.4 [304/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_softcap_sm80.cu.o 2025-09-07T08:04:21.5611390Z #34 4206.8 [305/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_split_sm80.cu.o 2025-09-07T08:04:53.9198454Z #34 4239.2 [306/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_softcapall_sm80.cu.o 2025-09-07T08:04:57.1775527Z #34 4242.5 [307/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_split_softcap_sm80.cu.o 2025-09-07T08:05:00.2758442Z #34 4245.6 [308/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_paged_split_softcapall_sm80.cu.o 2025-09-07T08:05:44.7183830Z #34 4290.0 [309/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_bf16_split_softcapall_sm80.cu.o 2025-09-07T08:07:47.9285676Z #34 4413.2 [310/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_sm90.cu.o 2025-09-07T08:08:25.7730769Z #34 4451.1 [311/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_softcap_sm90.cu.o 2025-09-07T08:08:37.3579499Z #34 4462.6 [312/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_sm90.cu.o 2025-09-07T08:08:42.1110643Z #34 4467.4 [313/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_packgqa_sm90.cu.o 2025-09-07T08:08:55.8594858Z #34 4481.1 [314/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_sm90.cu.o 2025-09-07T08:09:00.7313924Z #34 4486.0 [315/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_softcap_sm90.cu.o 2025-09-07T08:09:13.3868345Z #34 4498.7 [316/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_softcap_sm90.cu.o 2025-09-07T08:10:22.3460143Z #34 4567.6 [317/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_softcap_packgqa_sm90.cu.o 2025-09-07T08:10:38.3098586Z #34 4583.6 [318/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_split_sm90.cu.o 2025-09-07T08:11:23.3573963Z #34 4628.6 [319/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_split_softcap_sm90.cu.o 2025-09-07T08:11:28.6513205Z #34 4633.9 [320/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_packgqa_sm90.cu.o 2025-09-07T08:11:36.1276134Z #34 4641.4 [321/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_sm90.cu.o 2025-09-07T08:11:50.8908917Z #34 4656.2 [322/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_sm90.cu.o 2025-09-07T08:12:05.5036305Z #34 4670.8 [323/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_softcap_sm90.cu.o 2025-09-07T08:12:14.9978502Z #34 4680.3 [324/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_split_sm90.cu.o 2025-09-07T08:12:28.9577935Z #34 4694.2 [325/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_split_softcap_sm90.cu.o 2025-09-07T08:12:53.5419348Z #34 4718.8 [326/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_softcap_packgqa_sm90.cu.o 2025-09-07T08:12:55.0286490Z #34 4720.3 [327/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_softcap_sm90.cu.o 2025-09-07T08:14:24.8558068Z #34 4810.1 [328/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_split_sm90.cu.o 2025-09-07T08:15:01.9770022Z #34 4847.3 [329/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_sm90.cu.o 2025-09-07T08:15:10.8201344Z #34 4856.1 [330/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_packgqa_sm90.cu.o 2025-09-07T08:15:11.8395432Z #34 4857.1 [331/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_split_softcap_sm90.cu.o 2025-09-07T08:15:19.1620416Z #34 4864.4 [332/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_sm90.cu.o 2025-09-07T08:15:30.4975225Z #34 4875.8 [333/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_softcap_sm90.cu.o 2025-09-07T08:15:37.6823730Z #34 4883.0 [334/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_softcap_sm90.cu.o 2025-09-07T08:15:56.7005781Z #34 4902.0 [335/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_split_sm90.cu.o 2025-09-07T08:16:07.1832088Z #34 4912.5 [336/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_split_softcap_sm90.cu.o 2025-09-07T08:16:42.8686067Z #34 4948.2 [337/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_softcap_packgqa_sm90.cu.o 2025-09-07T08:18:13.6116755Z #34 5038.9 [338/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_split_sm90.cu.o 2025-09-07T08:18:51.4268242Z #34 5076.7 [339/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_split_softcap_sm90.cu.o 2025-09-07T08:19:14.3073627Z #34 5099.6 [340/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_sm90.cu.o 2025-09-07T08:20:04.2345937Z #34 5149.5 [341/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_softcap_sm90.cu.o 2025-09-07T08:20:17.0387269Z #34 5162.3 [342/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_packgqa_sm90.cu.o 2025-09-07T08:20:19.2406456Z #34 5164.5 [343/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_sm90.cu.o 2025-09-07T08:20:25.6140616Z #34 5170.9 [344/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_softcap_sm90.cu.o 2025-09-07T08:20:49.5011337Z #34 5194.8 [345/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_split_sm90.cu.o 2025-09-07T08:20:59.0245395Z #34 5204.3 [346/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_split_softcap_sm90.cu.o 2025-09-07T08:21:09.9275805Z #34 5215.2 [347/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_softcap_packgqa_sm90.cu.o 2025-09-07T08:22:45.3940370Z #34 5310.7 [348/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_packgqa_sm90.cu.o 2025-09-07T08:23:19.4710280Z #34 5344.8 [349/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_sm90.cu.o 2025-09-07T08:23:20.1411741Z #34 5345.4 [350/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_sm90.cu.o 2025-09-07T08:23:25.8495055Z #34 5351.1 [351/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_split_sm90.cu.o 2025-09-07T08:23:36.1186527Z #34 5361.4 [352/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_softcap_sm90.cu.o 2025-09-07T08:23:39.7261999Z #34 5365.0 [353/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_softcap_sm90.cu.o 2025-09-07T08:23:42.1884295Z #34 5367.5 [354/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_split_sm90.cu.o 2025-09-07T08:23:50.1197677Z #34 5375.4 [355/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_split_softcap_sm90.cu.o 2025-09-07T08:24:06.4424620Z #34 5391.7 [356/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_split_softcap_sm90.cu.o 2025-09-07T08:24:30.9464951Z #34 5416.2 [357/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_softcap_packgqa_sm90.cu.o 2025-09-07T08:25:57.2253451Z #34 5502.5 [358/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_sm90.cu.o 2025-09-07T08:26:03.5486426Z #34 5508.8 [359/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_packgqa_sm90.cu.o 2025-09-07T08:26:08.4428327Z #34 5513.7 [360/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_softcap_sm90.cu.o 2025-09-07T08:26:15.0155784Z #34 5520.3 [361/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_split_sm90.cu.o 2025-09-07T08:26:18.2492385Z #34 5523.5 [362/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_paged_split_softcap_sm90.cu.o 2025-09-07T08:26:20.2079698Z #34 5525.5 [363/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_sm90.cu.o 2025-09-07T08:26:20.7518285Z #34 5526.0 [364/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_split_sm90.cu.o 2025-09-07T08:26:50.9593640Z #34 5556.2 [365/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_softcap_packgqa_sm90.cu.o 2025-09-07T08:26:58.6146344Z #34 5563.9 [366/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_split_softcap_sm90.cu.o 2025-09-07T08:27:01.6058567Z #34 5566.9 [367/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_softcap_sm90.cu.o 2025-09-07T08:28:46.0863108Z #34 5671.4 [368/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_split_sm90.cu.o 2025-09-07T08:28:51.8007773Z #34 5677.1 [369/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_fp16_split_softcap_sm90.cu.o 2025-09-07T08:29:53.6816452Z #34 5739.0 [370/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_packgqa_sm90.cu.o 2025-09-07T08:29:53.6826943Z #34 5739.0 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T08:29:53.6845933Z #34 5739.0 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T08:29:53.6865149Z #34 5739.0 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T08:29:53.6883711Z #34 5739.0 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T08:30:08.3219489Z #34 5753.6 [371/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_paged_sm90.cu.o 2025-09-07T08:30:08.3225654Z #34 5753.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_13PipelineAsyncILi2EEES1A_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1A_S1A_S1D_S1F_S1H_iS1I_S1M_S1N_S1P_EUlRS1Q_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1Q_S1R_S1S_' 2025-09-07T08:30:08.3236943Z #34 5753.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_13PipelineAsyncILi2EEES1A_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1A_S1A_S1D_S1F_S1H_iS1I_S1M_S1N_S1P_EUlRS1Q_iE0_NS3_ILb1EEES1X_EEDaiS1Q_S1R_S1S_' 2025-09-07T08:30:08.3245669Z #34 5753.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1ELb0ELb1ELb0ELb0ENS_10bfloat16_tELi1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T08:30:08.3252640Z #34 5753.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb0ELb0ENS_10bfloat16_tELi1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T08:30:12.3839632Z #34 5757.7 [372/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_paged_softcap_sm90.cu.o 2025-09-07T08:30:12.3852571Z #34 5757.7 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb0ELb0ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_13PipelineAsyncILi2EEES1A_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1A_S1A_S1D_S1F_S1H_iS1I_S1M_S1N_S1P_EUlRS1Q_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1Q_S1R_S1S_' 2025-09-07T08:30:12.3875397Z #34 5757.7 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_13PipelineAsyncILi2EEES1A_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1A_S1A_S1D_S1F_S1H_iS1I_S1M_S1N_S1P_EUlRS1Q_iE0_NS3_ILb1EEES1X_EEDaiS1Q_S1R_S1S_' 2025-09-07T08:30:12.3894941Z #34 5757.7 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb0ELb1ELb1ELb0ELb1ELb0ELb0ENS_10bfloat16_tELi1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T08:30:12.3911359Z #34 5757.7 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb0ELb0ENS_10bfloat16_tELi1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb0ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T08:30:13.6688828Z #34 5759.0 [373/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_sm90.cu.o 2025-09-07T08:30:13.6701429Z #34 5759.0 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T08:30:13.6723304Z #34 5759.0 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T08:30:13.6746062Z #34 5759.0 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T08:30:13.6768501Z #34 5759.0 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T08:30:21.3544050Z #34 5766.6 [374/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_paged_split_sm90.cu.o 2025-09-07T08:30:21.3556410Z #34 5766.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_13PipelineAsyncILi2EEES1A_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1A_S1A_S1D_S1F_S1H_iS1I_S1M_S1N_S1P_EUlRS1Q_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1Q_S1R_S1S_' 2025-09-07T08:30:21.3575497Z #34 5766.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_13PipelineAsyncILi2EEES1A_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1A_S1A_S1D_S1F_S1H_iS1I_S1M_S1N_S1P_EUlRS1Q_iE0_NS3_ILb1EEES1X_EEDaiS1Q_S1R_S1S_' 2025-09-07T08:30:21.3587911Z #34 5766.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb0ELb1ELb1ELb0ELb1ELb1ELb0ENS_10bfloat16_tELi1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T08:30:21.3599718Z #34 5766.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb0ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb1ELb0ENS_10bfloat16_tELi1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T08:30:22.8796689Z #34 5768.2 [375/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_paged_split_softcap_sm90.cu.o 2025-09-07T08:30:22.8808797Z #34 5768.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb0ELb0ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_13PipelineAsyncILi2EEES1A_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1A_S1A_S1D_S1F_S1H_iS1I_S1M_S1N_S1P_EUlRS1Q_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1Q_S1R_S1S_' 2025-09-07T08:30:22.8823994Z #34 5768.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_13PipelineAsyncILi2EEES1A_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1A_S1A_S1D_S1F_S1H_iS1I_S1M_S1N_S1P_EUlRS1Q_iE0_NS3_ILb1EEES1X_EEDaiS1Q_S1R_S1S_' 2025-09-07T08:30:22.8836110Z #34 5768.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb0ELb1ELb1ELb0ELb1ELb1ELb0ENS_10bfloat16_tELi1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T08:30:22.8846701Z #34 5768.2 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZN7cutlass13device_kernelIN5flash20enable_sm90_or_laterINS1_16FlashAttnFwdSm90INS1_25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS5_1CILi1EEES8_S8_EEENS6_IJNS7_ILi128EEENS7_ILi112EEENS7_ILi64EEEEEELi256ENS_6half_tEfNS_4arch4Sm90ELb0ELb1ELb1ELb1ELb1ELb1ELb1ELb1ELb0ELb1ELb1ELb0ENS_10bfloat16_tELi1EEENS1_21CollectiveEpilogueFwdINS6_IJSA_NS7_ILi256EEESB_EEES9_SE_SG_Li256ELb1ELb1ELb1ELb0ELi1EEENS1_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEEEEEEEvNT_6ParamsE' 2025-09-07T08:30:26.3548886Z #34 5771.6 [376/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_softcap_sm90.cu.o 2025-09-07T08:30:26.3555304Z #34 5771.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb1ELb0ELb0ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T08:30:26.3567221Z #34 5771.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T08:30:26.3578980Z #34 5771.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb1ELb1ELb0ELb0ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T08:30:26.3590542Z #34 5771.6 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb0ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi32ELb0ELb0ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T08:30:45.0964774Z #34 5790.4 [377/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_softcap_packgqa_sm90.cu.o 2025-09-07T08:30:45.0975327Z #34 5790.4 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T08:30:45.0994902Z #34 5790.4 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T08:30:45.1014817Z #34 5790.4 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T08:30:45.1031369Z #34 5790.4 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb0ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb0ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T08:32:38.7686332Z #34 5904.1 [378/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_split_sm90.cu.o 2025-09-07T08:32:38.7693383Z #34 5904.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T08:32:38.7705137Z #34 5904.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T08:32:38.7716286Z #34 5904.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T08:32:38.7727545Z #34 5904.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T08:32:50.8564286Z #34 5916.1 [379/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_256_fp16_split_softcap_sm90.cu.o 2025-09-07T08:32:50.8570939Z #34 5916.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb0ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T08:32:50.8582741Z #34 5916.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb0ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T08:32:50.8594965Z #34 5916.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb0ELb1ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb0EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE1_NS3_ILb0EEENS3_ILb1EEEEEDaiS1S_S1T_S1U_' 2025-09-07T08:32:50.8608375Z #34 5916.1 ptxas info : (C7520) Potential Performance Loss: wgmma.mma_async instructions are serialized due to program dependence on compiler-inserted WG.AR in divergent path in the function '_ZZN5flash25CollectiveMainloopFwdSm90ILi2EN4cute5tupleIJNS1_1CILi1EEES4_S4_EEENS2_IJNS3_ILi128EEENS3_ILi112EEENS3_ILi64EEEEEELi256EN7cutlass6half_tEfNSA_4arch4Sm90ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb1ELb0ELb1ELb1ELb0ENSA_10bfloat16_tELi1EE3mmaINS_16FlashAttnFwdSm90ISF_NS_21CollectiveEpilogueFwdINS2_IJS6_NS3_ILi256EEES7_EEES5_SB_SD_Li256ELb1ELb1ELb1ELb0ELi1EEENS_36VarlenDynamicPersistentTileSchedulerILi128ELi256ELi128ELb1ELb1ELb1EEEE13SharedStorageENS1_6TensorINS1_11ArrayEngineIfLm128EEENS1_6LayoutINS2_IJNS2_IJNS3_ILi2EEESU_NS3_ILi32EEEEEES4_S4_EEENS2_IJNS2_IJS4_SU_NS3_ILi4EEEEEENS3_ILi0EEES10_EEEEEEENS_7SoftmaxILi2ELi0EEEEEbRKNSF_6ParamsENSA_25PipelineTmaAsyncNoClusterILi2ENSA_16PipelineTmaAsyncILi2EEEEES1C_RNSA_13PipelineStateILj2EEERT0_RT1_iRiRKNS_16SeqlenInfoQKNewKILb1ELb1EEENS2_IJiiiiEEERT_ENKUliT_T0_T1_E_clIZSG_ISP_S13_S15_EbS18_S1C_S1C_S1F_S1H_S1J_iS1K_S1O_S1P_S1R_EUlRS1S_iE0_NS3_ILb1EEES1Z_EEDaiS1S_S1T_S1U_' 2025-09-07T08:33:33.2911408Z #34 5958.6 [380/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_packgqa_sm90.cu.o 2025-09-07T08:33:40.0336602Z #34 5965.3 [381/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_sm90.cu.o 2025-09-07T08:33:54.5562935Z #34 5979.8 [382/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_paged_sm90.cu.o 2025-09-07T08:34:01.0003412Z #34 5986.3 [383/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_paged_softcap_sm90.cu.o 2025-09-07T08:34:03.6653000Z #34 5989.0 [384/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_softcap_sm90.cu.o 2025-09-07T08:34:09.4684082Z #34 5994.8 [385/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_softcap_packgqa_sm90.cu.o 2025-09-07T08:34:09.9086388Z #34 5995.2 [386/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_paged_split_sm90.cu.o 2025-09-07T08:34:17.1953518Z #34 6002.5 [387/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_paged_split_softcap_sm90.cu.o 2025-09-07T08:36:20.1304363Z #34 6125.4 [388/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_sm80.cu.o 2025-09-07T08:36:23.6330418Z #34 6128.9 [389/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_softcap_sm80.cu.o 2025-09-07T08:36:26.0207942Z #34 6131.3 [390/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_split_sm90.cu.o 2025-09-07T08:36:39.4073507Z #34 6144.7 [391/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_512_fp16_split_softcap_sm90.cu.o 2025-09-07T08:36:59.9766804Z #34 6165.3 [392/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_sm80.cu.o 2025-09-07T08:37:14.7085288Z #34 6180.0 [393/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_softcap_sm80.cu.o 2025-09-07T08:37:20.2719181Z #34 6185.6 [394/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_split_softcapall_sm80.cu.o 2025-09-07T08:37:30.7948631Z #34 6196.1 [395/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_softcap_sm80.cu.o 2025-09-07T08:38:38.8645585Z #34 6264.2 [396/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_split_sm80.cu.o 2025-09-07T08:38:40.1798576Z #34 6265.5 [397/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_split_softcap_sm80.cu.o 2025-09-07T08:38:56.3543076Z #34 6281.6 [398/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_paged_softcapall_sm80.cu.o 2025-09-07T08:39:01.7798119Z #34 6287.1 [399/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_softcapall_sm80.cu.o 2025-09-07T08:39:28.0926068Z #34 6313.4 [400/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_fp16_split_softcapall_sm80.cu.o 2025-09-07T08:39:29.2261045Z #34 6314.5 [401/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_sm80.cu.o 2025-09-07T08:39:50.3180633Z #34 6335.6 [402/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_softcap_sm80.cu.o 2025-09-07T08:40:26.4563916Z #34 6371.7 [403/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_split_sm80.cu.o 2025-09-07T08:40:37.9439832Z #34 6383.2 [404/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_split_softcap_sm80.cu.o 2025-09-07T08:41:29.8820405Z #34 6435.2 [405/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_softcapall_sm80.cu.o 2025-09-07T08:41:37.3513734Z #34 6442.6 [406/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_softcap_sm80.cu.o 2025-09-07T08:41:58.8979809Z #34 6464.2 [407/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_split_sm80.cu.o 2025-09-07T08:42:25.4846881Z #34 6490.8 [408/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_split_softcap_sm80.cu.o 2025-09-07T08:43:02.6702913Z #34 6528.0 [409/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_sm80.cu.o 2025-09-07T08:43:12.9680397Z #34 6538.3 [410/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_softcapall_sm80.cu.o 2025-09-07T08:43:30.5678748Z #34 6555.9 [411/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_paged_split_softcapall_sm80.cu.o 2025-09-07T08:43:38.9610267Z #34 6564.2 [412/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_softcap_sm80.cu.o 2025-09-07T08:44:00.1484782Z #34 6585.4 [413/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_fp16_split_softcapall_sm80.cu.o 2025-09-07T08:44:46.4046315Z #34 6631.7 [414/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_split_sm80.cu.o 2025-09-07T08:44:54.3942133Z #34 6639.7 [415/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_split_softcap_sm80.cu.o 2025-09-07T08:45:17.9130621Z #34 6663.2 [416/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_softcapall_sm80.cu.o 2025-09-07T08:45:29.7402074Z #34 6675.0 [417/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_softcap_sm80.cu.o 2025-09-07T08:46:11.6447546Z #34 6716.9 [418/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_sm80.cu.o 2025-09-07T08:46:21.3600653Z #34 6726.6 [419/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_split_sm80.cu.o 2025-09-07T08:46:40.4931969Z #34 6745.8 [420/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_split_softcap_sm80.cu.o 2025-09-07T08:46:59.0343584Z #34 6764.3 [421/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_softcap_sm80.cu.o 2025-09-07T08:46:59.2265745Z #34 6764.5 [422/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_paged_split_softcapall_sm80.cu.o 2025-09-07T08:47:25.2402964Z #34 6790.5 [423/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_split_sm80.cu.o 2025-09-07T08:47:27.5269040Z #34 6792.8 [424/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_softcapall_sm80.cu.o 2025-09-07T08:47:39.8864119Z #34 6805.2 [425/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_split_softcap_sm80.cu.o 2025-09-07T08:47:52.3532252Z #34 6817.6 [426/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_softcapall_sm80.cu.o 2025-09-07T08:48:21.3599059Z #34 6846.6 [427/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_fp16_split_softcapall_sm80.cu.o 2025-09-07T08:48:31.1507671Z #34 6856.4 [428/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_softcap_sm80.cu.o 2025-09-07T08:49:04.1128709Z #34 6889.4 [429/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_split_softcap_sm80.cu.o 2025-09-07T08:49:04.3592925Z #34 6889.5 [430/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_split_sm80.cu.o 2025-09-07T08:49:09.4496560Z #34 6894.7 [431/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_paged_split_softcapall_sm80.cu.o 2025-09-07T08:49:32.5519534Z #34 6917.8 [432/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_softcapall_sm80.cu.o 2025-09-07T08:49:43.3431491Z #34 6928.6 [433/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_sm80.cu.o 2025-09-07T08:49:55.2911670Z #34 6940.6 [434/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_softcap_sm80.cu.o 2025-09-07T08:50:15.6431603Z #34 6960.9 [435/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_fp16_split_softcapall_sm80.cu.o 2025-09-07T08:50:34.8283006Z #34 6980.1 [436/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_split_sm80.cu.o 2025-09-07T08:50:44.8371412Z #34 6990.1 [437/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_split_softcap_sm80.cu.o 2025-09-07T08:50:52.4725339Z #34 6997.8 [438/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_softcapall_sm80.cu.o 2025-09-07T08:51:16.5125333Z #34 7021.8 [439/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_softcap_sm80.cu.o 2025-09-07T08:51:42.9378957Z #34 7048.2 [440/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_split_sm80.cu.o 2025-09-07T08:51:52.6525150Z #34 7057.9 [441/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_split_softcap_sm80.cu.o 2025-09-07T08:52:03.9580507Z #34 7069.2 [442/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_softcapall_sm80.cu.o 2025-09-07T08:52:09.3316387Z #34 7074.6 [443/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_paged_split_softcapall_sm80.cu.o 2025-09-07T08:52:52.8306445Z #34 7118.1 [444/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_fp16_split_softcapall_sm80.cu.o 2025-09-07T08:53:42.3528849Z #34 7167.6 [445/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_packgqa_sm90.cu.o 2025-09-07T08:54:00.4013702Z #34 7185.7 [446/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_paged_sm90.cu.o 2025-09-07T08:54:01.7117083Z #34 7187.0 [447/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_sm90.cu.o 2025-09-07T08:54:09.6140722Z #34 7194.9 [448/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_paged_softcap_sm90.cu.o 2025-09-07T08:54:25.2361110Z #34 7210.5 [449/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_paged_split_sm90.cu.o 2025-09-07T08:54:46.6216748Z #34 7231.9 [450/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_softcap_sm90.cu.o 2025-09-07T08:54:50.0281213Z #34 7235.3 [451/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_paged_split_softcap_sm90.cu.o 2025-09-07T08:55:37.1116252Z #34 7282.4 [452/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_softcap_packgqa_sm90.cu.o 2025-09-07T08:55:42.7178358Z #34 7288.0 [453/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_split_sm90.cu.o 2025-09-07T08:56:44.5081106Z #34 7349.8 [454/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim128_e4m3_split_softcap_sm90.cu.o 2025-09-07T08:57:16.2742593Z #34 7381.6 [455/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_sm90.cu.o 2025-09-07T08:57:20.6844111Z #34 7386.0 [456/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_packgqa_sm90.cu.o 2025-09-07T08:57:57.6542292Z #34 7422.9 [457/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_paged_softcap_sm90.cu.o 2025-09-07T08:57:58.2584352Z #34 7423.5 [458/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_paged_sm90.cu.o 2025-09-07T08:58:13.4117882Z #34 7438.7 [459/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_paged_split_sm90.cu.o 2025-09-07T08:58:17.4142617Z #34 7442.7 [460/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_softcap_sm90.cu.o 2025-09-07T08:58:30.5681769Z #34 7455.9 [461/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_paged_split_softcap_sm90.cu.o 2025-09-07T08:58:45.2886612Z #34 7470.6 [462/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_softcap_packgqa_sm90.cu.o 2025-09-07T08:59:31.5151755Z #34 7516.8 [463/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_split_sm90.cu.o 2025-09-07T09:00:49.4968241Z #34 7594.8 [464/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_e4m3_split_softcap_sm90.cu.o 2025-09-07T09:00:59.0002376Z #34 7604.3 [465/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_sm90.cu.o 2025-09-07T09:01:03.5454008Z #34 7608.8 [466/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_paged_sm90.cu.o 2025-09-07T09:01:08.0338008Z #34 7613.3 [467/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_packgqa_sm90.cu.o 2025-09-07T09:01:28.3044241Z #34 7633.6 [468/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_softcap_sm90.cu.o 2025-09-07T09:01:41.8277686Z #34 7647.1 [469/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_paged_softcap_sm90.cu.o 2025-09-07T09:01:50.7691434Z #34 7656.1 [470/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_paged_split_sm90.cu.o 2025-09-07T09:02:07.8704028Z #34 7673.2 [471/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_paged_split_softcap_sm90.cu.o 2025-09-07T09:02:24.5931306Z #34 7689.9 [472/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_softcap_packgqa_sm90.cu.o 2025-09-07T09:03:31.5198974Z #34 7756.8 [473/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_split_sm90.cu.o 2025-09-07T09:03:55.5988580Z #34 7780.9 [474/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_sm90.cu.o 2025-09-07T09:03:56.4873564Z #34 7781.8 [475/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_packgqa_sm90.cu.o 2025-09-07T09:04:11.2977172Z #34 7796.6 [476/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_paged_sm90.cu.o 2025-09-07T09:04:15.7821443Z #34 7801.1 [477/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_paged_softcap_sm90.cu.o 2025-09-07T09:04:34.1123759Z #34 7819.4 [478/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_softcap_sm90.cu.o 2025-09-07T09:04:40.9003183Z #34 7826.2 [479/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_paged_split_sm90.cu.o 2025-09-07T09:04:53.2910505Z #34 7838.6 [480/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim256_e4m3_split_softcap_sm90.cu.o 2025-09-07T09:04:56.1385945Z #34 7841.4 [481/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_paged_split_softcap_sm90.cu.o 2025-09-07T09:05:07.8032510Z #34 7853.1 [482/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_softcap_packgqa_sm90.cu.o 2025-09-07T09:06:33.6970971Z #34 7939.0 [483/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_split_sm90.cu.o 2025-09-07T09:06:59.9757834Z #34 7965.3 [484/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim64_e4m3_split_softcap_sm90.cu.o 2025-09-07T09:07:08.0721505Z #34 7973.4 [485/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_sm90.cu.o 2025-09-07T09:07:12.0175964Z #34 7977.3 [486/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_packgqa_sm90.cu.o 2025-09-07T09:07:21.2917817Z #34 7986.6 [487/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_softcap_sm90.cu.o 2025-09-07T09:07:29.8972992Z #34 7995.2 [488/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_paged_sm90.cu.o 2025-09-07T09:07:33.6054660Z #34 7998.9 [489/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_paged_softcap_sm90.cu.o 2025-09-07T09:07:57.9370754Z #34 8023.2 [490/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_paged_split_sm90.cu.o 2025-09-07T09:08:05.4161069Z #34 8030.7 [491/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_paged_split_softcap_sm90.cu.o 2025-09-07T09:08:13.0276960Z #34 8038.3 [492/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_softcap_packgqa_sm90.cu.o 2025-09-07T09:09:49.1552337Z #34 8134.4 [493/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_packgqa_sm90.cu.o 2025-09-07T09:09:55.1636277Z #34 8140.5 [494/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_split_sm90.cu.o 2025-09-07T09:10:10.4029846Z #34 8155.7 [495/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_paged_sm90.cu.o 2025-09-07T09:10:10.8784403Z #34 8156.2 [496/510] Linking CXX shared module cumem_allocator.abi3.so 2025-09-07T09:10:12.7439014Z #34 8158.0 [497/510] Linking CXX shared module _C.abi3.so 2025-09-07T09:10:13.8650711Z #34 8159.2 [498/510] Linking CXX shared module _moe_C.abi3.so 2025-09-07T09:10:14.3955124Z #34 8159.7 [499/510] Linking CXX shared module _flashmla_C.abi3.so 2025-09-07T09:10:15.7017561Z #34 8161.0 [500/510] Linking CXX shared module vllm-flash-attn/_vllm_fa2_C.abi3.so 2025-09-07T09:10:19.3158040Z #34 8164.6 [501/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim96_e4m3_split_softcap_sm90.cu.o 2025-09-07T09:10:21.1804877Z #34 8166.5 [502/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_paged_softcap_sm90.cu.o 2025-09-07T09:10:25.7630653Z #34 8171.1 [503/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_sm90.cu.o 2025-09-07T09:10:35.9250160Z #34 8181.2 [504/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_paged_split_sm90.cu.o 2025-09-07T09:10:37.8425002Z #34 8183.1 [505/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_paged_split_softcap_sm90.cu.o 2025-09-07T09:10:53.8993993Z #34 8199.2 [506/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_softcap_sm90.cu.o 2025-09-07T09:10:57.1398963Z #34 8202.4 [507/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_softcap_packgqa_sm90.cu.o 2025-09-07T09:12:22.0767323Z #34 8287.4 [508/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_split_sm90.cu.o 2025-09-07T09:12:38.8925971Z #34 8304.2 [509/510] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdim192_128_e4m3_split_softcap_sm90.cu.o 2025-09-07T09:12:41.9814606Z #34 8307.3 [510/510] Linking CXX shared module vllm-flash-attn/_vllm_fa3_C.abi3.so 2025-09-07T09:12:42.1551833Z #34 8307.4 -- Install configuration: "Release" 2025-09-07T09:12:42.3097753Z #34 8307.4 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/_moe_C.abi3.so 2025-09-07T09:12:42.3099653Z #34 8307.4 -- Set non-toolchain portion of runtime path of "/workspace/build/lib.linux-x86_64-cpython-312/vllm/_moe_C.abi3.so" to "" 2025-09-07T09:12:42.3286064Z #34 8307.6 -- Install configuration: "Release" 2025-09-07T09:12:42.4830853Z #34 8307.6 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/_vllm_fa2_C.abi3.so 2025-09-07T09:12:42.4832050Z #34 8307.6 -- Set non-toolchain portion of runtime path of "/workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/_vllm_fa2_C.abi3.so" to "" 2025-09-07T09:12:42.4833075Z #34 8307.6 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn 2025-09-07T09:12:42.4833883Z #34 8307.6 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/__init__.py 2025-09-07T09:12:42.4834807Z #34 8307.6 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/flash_attn_interface.py 2025-09-07T09:12:42.4835884Z #34 8307.6 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers 2025-09-07T09:12:42.4836752Z #34 8307.6 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/__init__.py 2025-09-07T09:12:42.4837674Z #34 8307.6 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/rotary.py 2025-09-07T09:12:42.4838515Z #34 8307.6 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops 2025-09-07T09:12:42.4839338Z #34 8307.6 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton 2025-09-07T09:12:42.4840246Z #34 8307.6 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/__init__.py 2025-09-07T09:12:42.4841194Z #34 8307.6 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/rotary.py 2025-09-07T09:12:42.5023604Z #34 8307.8 -- Install configuration: "Release" 2025-09-07T09:12:42.6535678Z #34 8307.8 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/_vllm_fa3_C.abi3.so 2025-09-07T09:12:46.2576441Z #34 8311.5 -- Set non-toolchain portion of runtime path of "/workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/_vllm_fa3_C.abi3.so" to "" 2025-09-07T09:12:46.2577712Z #34 8311.5 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn 2025-09-07T09:12:46.4084655Z #34 8311.5 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/__init__.py 2025-09-07T09:12:46.4085607Z #34 8311.5 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/flash_attn_interface.py 2025-09-07T09:12:46.4086517Z #34 8311.5 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers 2025-09-07T09:12:46.4087379Z #34 8311.5 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/__init__.py 2025-09-07T09:12:46.4088282Z #34 8311.5 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/rotary.py 2025-09-07T09:12:46.4089113Z #34 8311.5 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops 2025-09-07T09:12:46.4089904Z #34 8311.5 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton 2025-09-07T09:12:46.4091002Z #34 8311.5 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/__init__.py 2025-09-07T09:12:46.4092114Z #34 8311.5 -- Up-to-date: /workspace/build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/rotary.py 2025-09-07T09:12:46.4340702Z #34 8311.7 -- Install configuration: "Release" 2025-09-07T09:12:46.5873155Z #34 8311.7 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/_flashmla_C.abi3.so 2025-09-07T09:12:46.5874183Z #34 8311.7 -- Set non-toolchain portion of runtime path of "/workspace/build/lib.linux-x86_64-cpython-312/vllm/_flashmla_C.abi3.so" to "" 2025-09-07T09:12:46.6072011Z #34 8311.9 -- Install configuration: "Release" 2025-09-07T09:12:46.6072714Z #34 8311.9 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/cumem_allocator.abi3.so 2025-09-07T09:12:46.7602043Z #34 8311.9 -- Set non-toolchain portion of runtime path of "/workspace/build/lib.linux-x86_64-cpython-312/vllm/cumem_allocator.abi3.so" to "" 2025-09-07T09:12:46.7812444Z #34 8312.1 -- Install configuration: "Release" 2025-09-07T09:12:46.7813080Z #34 8312.1 -- Installing: /workspace/build/lib.linux-x86_64-cpython-312/vllm/_C.abi3.so 2025-09-07T09:12:46.8814540Z #34 8312.1 -- Set non-toolchain portion of runtime path of "/workspace/build/lib.linux-x86_64-cpython-312/vllm/_C.abi3.so" to "" 2025-09-07T09:12:46.8815711Z #34 8312.1 Copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/__init__.py to vllm/vllm_flash_attn/__init__.py 2025-09-07T09:12:46.8816713Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/__init__.py -> vllm/vllm_flash_attn 2025-09-07T09:12:46.8818039Z #34 8312.1 Copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/flash_attn_interface.py to vllm/vllm_flash_attn/flash_attn_interface.py 2025-09-07T09:12:46.8819229Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/flash_attn_interface.py -> vllm/vllm_flash_attn 2025-09-07T09:12:46.8820370Z #34 8312.1 Copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/__init__.py to vllm/vllm_flash_attn/layers/__init__.py 2025-09-07T09:12:46.8821501Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/__init__.py -> vllm/vllm_flash_attn/layers 2025-09-07T09:12:46.8822630Z #34 8312.1 Copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/rotary.py to vllm/vllm_flash_attn/layers/rotary.py 2025-09-07T09:12:46.8823751Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/rotary.py -> vllm/vllm_flash_attn/layers 2025-09-07T09:12:46.8825084Z #34 8312.1 Copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/__init__.py to vllm/vllm_flash_attn/ops/triton/__init__.py 2025-09-07T09:12:46.8826262Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/__init__.py -> vllm/vllm_flash_attn/ops/triton 2025-09-07T09:12:46.8827439Z #34 8312.1 Copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/rotary.py to vllm/vllm_flash_attn/ops/triton/rotary.py 2025-09-07T09:12:46.8828598Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/rotary.py -> vllm/vllm_flash_attn/ops/triton 2025-09-07T09:12:46.8829907Z #34 8312.1 /opt/python/cp312-cp312/lib/python3.12/site-packages/setuptools/_distutils/cmd.py:90: SetuptoolsDeprecationWarning: setup.py install is deprecated. 2025-09-07T09:12:46.8830799Z #34 8312.1 !! 2025-09-07T09:12:46.8831020Z #34 8312.1 2025-09-07T09:12:46.8831316Z #34 8312.1 ******************************************************************************** 2025-09-07T09:12:46.8831766Z #34 8312.1 Please avoid running ``setup.py`` directly. 2025-09-07T09:12:46.8832259Z #34 8312.1 Instead, use pypa/build, pypa/installer or other 2025-09-07T09:12:46.8832684Z #34 8312.1 standards-based tools. 2025-09-07T09:12:46.8833014Z #34 8312.1 2025-09-07T09:12:46.8833499Z #34 8312.1 See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details. 2025-09-07T09:12:46.8834175Z #34 8312.1 ******************************************************************************** 2025-09-07T09:12:46.8834554Z #34 8312.1 2025-09-07T09:12:46.8834762Z #34 8312.1 !! 2025-09-07T09:12:46.8835019Z #34 8312.1 self.initialize_options() 2025-09-07T09:12:46.8835463Z #34 8312.1 installing to build/bdist.linux-x86_64/wheel 2025-09-07T09:12:46.8835856Z #34 8312.1 running install 2025-09-07T09:12:46.8836134Z #34 8312.1 running install_lib 2025-09-07T09:12:46.8836478Z #34 8312.1 creating build/bdist.linux-x86_64/wheel 2025-09-07T09:12:46.8836911Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm 2025-09-07T09:12:46.8837587Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8838527Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/_custom_ops.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8839451Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/_ipex_ops.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8840393Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/beam_search.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8841359Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/collect_env.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8842316Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/connections.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8843292Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/env_override.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8844259Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/envs.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8845214Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/forward_context.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8846176Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/logger.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8847120Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/logits_process.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8848090Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/logprobs.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8849019Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/outputs.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8849984Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/pooling_params.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8851021Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/sampling_params.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8851999Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/scalar_type.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8853228Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/scripts.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8854182Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/sequence.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8855135Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/tasks.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8856099Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/test_utils.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8857049Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/tracing.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8858001Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/version.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8858943Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/_version.py -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:46.8859723Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/adapter_commons 2025-09-07T09:12:46.8860689Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/adapter_commons/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/adapter_commons 2025-09-07T09:12:46.8861920Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/adapter_commons/layers.py -> build/bdist.linux-x86_64/wheel/./vllm/adapter_commons 2025-09-07T09:12:46.8863201Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/adapter_commons/models.py -> build/bdist.linux-x86_64/wheel/./vllm/adapter_commons 2025-09-07T09:12:46.8864457Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/adapter_commons/request.py -> build/bdist.linux-x86_64/wheel/./vllm/adapter_commons 2025-09-07T09:12:46.8865766Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/adapter_commons/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/adapter_commons 2025-09-07T09:12:46.8867008Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/adapter_commons/worker_manager.py -> build/bdist.linux-x86_64/wheel/./vllm/adapter_commons 2025-09-07T09:12:46.8867900Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/assets 2025-09-07T09:12:46.8868668Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/assets/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/assets 2025-09-07T09:12:46.8869707Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/assets/audio.py -> build/bdist.linux-x86_64/wheel/./vllm/assets 2025-09-07T09:12:46.8870733Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/assets/base.py -> build/bdist.linux-x86_64/wheel/./vllm/assets 2025-09-07T09:12:46.8871760Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/assets/image.py -> build/bdist.linux-x86_64/wheel/./vllm/assets 2025-09-07T09:12:46.8872816Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/assets/video.py -> build/bdist.linux-x86_64/wheel/./vllm/assets 2025-09-07T09:12:46.8873600Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/attention 2025-09-07T09:12:46.8874414Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/attention 2025-09-07T09:12:46.8875503Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/layer.py -> build/bdist.linux-x86_64/wheel/./vllm/attention 2025-09-07T09:12:46.8876613Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/selector.py -> build/bdist.linux-x86_64/wheel/./vllm/attention 2025-09-07T09:12:46.8889301Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/attention/backends 2025-09-07T09:12:46.8890406Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T09:12:46.8891842Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/abstract.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T09:12:46.8893779Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/differential_flash_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T09:12:46.8895286Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/dual_chunk_flash_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T09:12:46.8896723Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/flash_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T09:12:46.8898095Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/flashmla.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T09:12:46.8899525Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/placeholder_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T09:12:46.8900973Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/rocm_aiter_mla.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T09:12:46.8902375Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/rocm_flash_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T09:12:46.8903869Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/triton_mla.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T09:12:46.8905397Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T09:12:46.8906699Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/xformers.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends 2025-09-07T09:12:46.8907760Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/attention/backends/mla 2025-09-07T09:12:46.8908792Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/mla/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends/mla 2025-09-07T09:12:46.8910180Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/backends/mla/common.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/backends/mla 2025-09-07T09:12:46.8911170Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/attention/layers 2025-09-07T09:12:46.8912099Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/layers/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/layers 2025-09-07T09:12:46.8913427Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/layers/chunked_local_attention.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/layers 2025-09-07T09:12:46.8914849Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/layers/encoder_only_attention.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/layers 2025-09-07T09:12:46.8915901Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/attention/ops 2025-09-07T09:12:46.8916766Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T09:12:46.8918063Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/chunked_prefill_paged_decode.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T09:12:46.8919367Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/common.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T09:12:46.8920559Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/flashmla.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T09:12:46.8921809Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/merge_attn_states.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T09:12:46.8923103Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/paged_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T09:12:46.8924385Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/pallas_kv_cache_update.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T09:12:46.8925693Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/prefix_prefill.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T09:12:46.8926936Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/rocm_aiter_mla.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T09:12:46.8928215Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/rocm_aiter_paged_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T09:12:46.8929553Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/triton_decode_attention.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T09:12:46.8930898Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/triton_flash_attention.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T09:12:46.8932299Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/triton_merge_attn_states.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T09:12:46.8933935Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/ops/triton_unified_attention.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/ops 2025-09-07T09:12:46.8934970Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/attention/utils 2025-09-07T09:12:46.8935909Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/utils/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/utils 2025-09-07T09:12:46.8937157Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/utils/fa_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/utils 2025-09-07T09:12:46.8938474Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/attention/utils/kv_sharing_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/attention/utils 2025-09-07T09:12:46.8939437Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/benchmarks 2025-09-07T09:12:46.8940279Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/benchmarks/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/benchmarks 2025-09-07T09:12:46.8941452Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/benchmarks/datasets.py -> build/bdist.linux-x86_64/wheel/./vllm/benchmarks 2025-09-07T09:12:46.8942631Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/benchmarks/latency.py -> build/bdist.linux-x86_64/wheel/./vllm/benchmarks 2025-09-07T09:12:46.8943794Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/benchmarks/serve.py -> build/bdist.linux-x86_64/wheel/./vllm/benchmarks 2025-09-07T09:12:46.8945113Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/benchmarks/throughput.py -> build/bdist.linux-x86_64/wheel/./vllm/benchmarks 2025-09-07T09:12:46.8945994Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/benchmarks/lib 2025-09-07T09:12:46.8946889Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/benchmarks/lib/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/benchmarks/lib 2025-09-07T09:12:46.8948168Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/benchmarks/lib/endpoint_request_func.py -> build/bdist.linux-x86_64/wheel/./vllm/benchmarks/lib 2025-09-07T09:12:46.8949490Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/benchmarks/lib/ready_checker.py -> build/bdist.linux-x86_64/wheel/./vllm/benchmarks/lib 2025-09-07T09:12:46.8950729Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/benchmarks/lib/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/benchmarks/lib 2025-09-07T09:12:46.8951592Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/compilation 2025-09-07T09:12:46.8952479Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8953725Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/activation_quant_fusion.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8954966Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/backends.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8956183Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/base_static_graph.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8957429Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/collective_fusion.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8958702Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/compiler_interface.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8959939Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/counter.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8961097Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/cuda_graph.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8962371Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/cuda_piecewise_backend.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8963622Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/decorators.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8964903Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/fix_functionalization.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8966137Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/fusion.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8967300Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/fusion_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8968473Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/fx_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8969666Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/inductor_pass.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8970845Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/monitor.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8972062Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/multi_output_match.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8973580Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/noop_elimination.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8974874Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/pass_manager.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8976177Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/sequence_parallelism.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8977534Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/torch25_custom_graph_pass.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8978877Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/vllm_inductor_pass.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8980128Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/compilation/wrapper.py -> build/bdist.linux-x86_64/wheel/./vllm/compilation 2025-09-07T09:12:46.8980977Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/config 2025-09-07T09:12:46.8981777Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/config/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/config 2025-09-07T09:12:46.8982879Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/config/cache.py -> build/bdist.linux-x86_64/wheel/./vllm/config 2025-09-07T09:12:46.8983975Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/config/compilation.py -> build/bdist.linux-x86_64/wheel/./vllm/config 2025-09-07T09:12:46.8985198Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/config/parallel.py -> build/bdist.linux-x86_64/wheel/./vllm/config 2025-09-07T09:12:46.8986263Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/config/scheduler.py -> build/bdist.linux-x86_64/wheel/./vllm/config 2025-09-07T09:12:46.8987324Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/config/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/config 2025-09-07T09:12:46.8988093Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/core 2025-09-07T09:12:46.8988823Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/core/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/core 2025-09-07T09:12:46.8989860Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/core/block_manager.py -> build/bdist.linux-x86_64/wheel/./vllm/core 2025-09-07T09:12:46.8990897Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/core/evictor.py -> build/bdist.linux-x86_64/wheel/./vllm/core 2025-09-07T09:12:46.8992111Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/core/interfaces.py -> build/bdist.linux-x86_64/wheel/./vllm/core 2025-09-07T09:12:46.8993486Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/core/placeholder_block_space_manager.py -> build/bdist.linux-x86_64/wheel/./vllm/core 2025-09-07T09:12:46.8994766Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/core/scheduler.py -> build/bdist.linux-x86_64/wheel/./vllm/core 2025-09-07T09:12:46.8995588Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/core/block 2025-09-07T09:12:46.8996429Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/core/block/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/core/block 2025-09-07T09:12:46.8997602Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/core/block/block_table.py -> build/bdist.linux-x86_64/wheel/./vllm/core/block 2025-09-07T09:12:46.8998785Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/core/block/common.py -> build/bdist.linux-x86_64/wheel/./vllm/core/block 2025-09-07T09:12:46.9000019Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/core/block/cpu_gpu_block_allocator.py -> build/bdist.linux-x86_64/wheel/./vllm/core/block 2025-09-07T09:12:46.9001282Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/core/block/interfaces.py -> build/bdist.linux-x86_64/wheel/./vllm/core/block 2025-09-07T09:12:46.9002490Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/core/block/naive_block.py -> build/bdist.linux-x86_64/wheel/./vllm/core/block 2025-09-07T09:12:46.9003722Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/core/block/prefix_caching_block.py -> build/bdist.linux-x86_64/wheel/./vllm/core/block 2025-09-07T09:12:46.9005086Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/core/block/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/core/block 2025-09-07T09:12:46.9005930Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/device_allocator 2025-09-07T09:12:46.9006843Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/device_allocator/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/device_allocator 2025-09-07T09:12:46.9008059Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/device_allocator/cumem.py -> build/bdist.linux-x86_64/wheel/./vllm/device_allocator 2025-09-07T09:12:46.9008937Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/distributed 2025-09-07T09:12:46.9009791Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed 2025-09-07T09:12:46.9011023Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/communication_op.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed 2025-09-07T09:12:46.9012309Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_events.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed 2025-09-07T09:12:46.9013733Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/parallel_state.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed 2025-09-07T09:12:46.9015029Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/tpu_distributed_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed 2025-09-07T09:12:46.9016294Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed 2025-09-07T09:12:46.9017264Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/distributed/device_communicators 2025-09-07T09:12:46.9018445Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T09:12:46.9020080Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/all2all.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T09:12:46.9021745Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/all_reduce_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T09:12:46.9023555Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/base_device_communicator.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T09:12:46.9025439Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/cpu_communicator.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T09:12:46.9027102Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/cuda_communicator.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T09:12:46.9028758Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/cuda_wrapper.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T09:12:46.9030414Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/custom_all_reduce.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T09:12:46.9032020Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/pynccl.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T09:12:46.9033642Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/pynccl_wrapper.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T09:12:46.9035343Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/quick_all_reduce.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T09:12:46.9036996Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/ray_communicator.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T09:12:46.9038644Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/shm_broadcast.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T09:12:46.9040245Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/symm_mem.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T09:12:46.9041857Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/tpu_communicator.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T09:12:46.9043553Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/device_communicators/xpu_communicator.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/device_communicators 2025-09-07T09:12:46.9044692Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/distributed/eplb 2025-09-07T09:12:46.9045609Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/eplb/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/eplb 2025-09-07T09:12:46.9046880Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/eplb/eplb_state.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/eplb 2025-09-07T09:12:46.9048211Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/eplb/rebalance_algo.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/eplb 2025-09-07T09:12:46.9049561Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/eplb/rebalance_execute.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/eplb 2025-09-07T09:12:46.9050585Z #34 8312.1 creating build/bdist.linux-x86_64/wheel/vllm/distributed/kv_transfer 2025-09-07T09:12:46.9051591Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer 2025-09-07T09:12:46.9053299Z #34 8312.1 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_transfer_state.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer 2025-09-07T09:12:46.9054453Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/distributed/kv_transfer/kv_connector 2025-09-07T09:12:46.9055666Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector 2025-09-07T09:12:46.9057342Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/base.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector 2025-09-07T09:12:46.9059008Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/factory.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector 2025-09-07T09:12:46.9060660Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector 2025-09-07T09:12:46.9061907Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T09:12:46.9063171Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T09:12:46.9064982Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/base.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T09:12:46.9066743Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T09:12:46.9068511Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T09:12:46.9070272Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T09:12:46.9072088Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/shared_storage_connector.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1 2025-09-07T09:12:46.9073421Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T09:12:46.9074752Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/p2p/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T09:12:46.9076552Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_connector.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T09:12:46.9078371Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_engine.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T09:12:46.9080210Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_connector/v1/p2p/tensor_memory_pool.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_connector/v1/p2p 2025-09-07T09:12:46.9081553Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T09:12:46.9082792Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_lookup_buffer/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T09:12:46.9084458Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_lookup_buffer/base.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T09:12:46.9086202Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_lookup_buffer/mooncake_store.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T09:12:46.9087978Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_lookup_buffer/simple_buffer.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_lookup_buffer 2025-09-07T09:12:46.9089218Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/distributed/kv_transfer/kv_pipe 2025-09-07T09:12:46.9090349Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_pipe/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_pipe 2025-09-07T09:12:46.9091835Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_pipe/base.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_pipe 2025-09-07T09:12:46.9093798Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_pipe 2025-09-07T09:12:46.9095431Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/kv_pipe/pynccl_pipe.py -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer/kv_pipe 2025-09-07T09:12:46.9096937Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/README.md -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer 2025-09-07T09:12:46.9098528Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/distributed/kv_transfer/disagg_prefill_workflow.jpg -> build/bdist.linux-x86_64/wheel/./vllm/distributed/kv_transfer 2025-09-07T09:12:46.9099585Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/engine 2025-09-07T09:12:46.9100387Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/engine/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/engine 2025-09-07T09:12:46.9101477Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/engine/arg_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/engine 2025-09-07T09:12:46.9102602Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/engine/async_llm_engine.py -> build/bdist.linux-x86_64/wheel/./vllm/engine 2025-09-07T09:12:46.9103761Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/engine/async_timeout.py -> build/bdist.linux-x86_64/wheel/./vllm/engine 2025-09-07T09:12:46.9105022Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/engine/llm_engine.py -> build/bdist.linux-x86_64/wheel/./vllm/engine 2025-09-07T09:12:46.9106088Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/engine/metrics.py -> build/bdist.linux-x86_64/wheel/./vllm/engine 2025-09-07T09:12:46.9107175Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/engine/metrics_types.py -> build/bdist.linux-x86_64/wheel/./vllm/engine 2025-09-07T09:12:46.9108262Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/engine/protocol.py -> build/bdist.linux-x86_64/wheel/./vllm/engine 2025-09-07T09:12:46.9109134Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/engine/multiprocessing 2025-09-07T09:12:46.9110168Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/engine/multiprocessing/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/engine/multiprocessing 2025-09-07T09:12:46.9111564Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/engine/multiprocessing/client.py -> build/bdist.linux-x86_64/wheel/./vllm/engine/multiprocessing 2025-09-07T09:12:46.9112973Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/engine/multiprocessing/engine.py -> build/bdist.linux-x86_64/wheel/./vllm/engine/multiprocessing 2025-09-07T09:12:46.9113999Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/engine/output_processor 2025-09-07T09:12:46.9115077Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/engine/output_processor 2025-09-07T09:12:46.9116482Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor/interfaces.py -> build/bdist.linux-x86_64/wheel/./vllm/engine/output_processor 2025-09-07T09:12:46.9117952Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor/single_step.py -> build/bdist.linux-x86_64/wheel/./vllm/engine/output_processor 2025-09-07T09:12:46.9119372Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor/stop_checker.py -> build/bdist.linux-x86_64/wheel/./vllm/engine/output_processor 2025-09-07T09:12:46.9120774Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/engine/output_processor/util.py -> build/bdist.linux-x86_64/wheel/./vllm/engine/output_processor 2025-09-07T09:12:46.9121735Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/entrypoints 2025-09-07T09:12:46.9122570Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T09:12:46.9123736Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/api_server.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T09:12:46.9124908Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/chat_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T09:12:46.9126095Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/constants.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T09:12:46.9127300Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/context.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T09:12:46.9128479Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/harmony_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T09:12:46.9129685Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/launcher.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T09:12:46.9130833Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/llm.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T09:12:46.9131972Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/logger.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T09:12:46.9133405Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/renderer.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T09:12:46.9134679Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/score_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T09:12:46.9135880Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/ssl.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T09:12:46.9137026Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/tool.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T09:12:46.9138229Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/tool_server.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T09:12:46.9139429Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints 2025-09-07T09:12:46.9140326Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/entrypoints/cli 2025-09-07T09:12:46.9141273Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli 2025-09-07T09:12:46.9142561Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/collect_env.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli 2025-09-07T09:12:46.9143956Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/main.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli 2025-09-07T09:12:46.9145346Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/openai.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli 2025-09-07T09:12:46.9146587Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/run_batch.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli 2025-09-07T09:12:46.9147852Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/serve.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli 2025-09-07T09:12:46.9149055Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/types.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli 2025-09-07T09:12:46.9150015Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/entrypoints/cli/benchmark 2025-09-07T09:12:46.9151135Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli/benchmark 2025-09-07T09:12:46.9152631Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark/base.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli/benchmark 2025-09-07T09:12:46.9154088Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark/latency.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli/benchmark 2025-09-07T09:12:46.9155547Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark/main.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli/benchmark 2025-09-07T09:12:46.9156973Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark/serve.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli/benchmark 2025-09-07T09:12:46.9158509Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/cli/benchmark/throughput.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/cli/benchmark 2025-09-07T09:12:46.9159570Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/entrypoints/openai 2025-09-07T09:12:46.9160535Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T09:12:46.9161852Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/api_server.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T09:12:46.9163172Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/cli_args.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T09:12:46.9164542Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/logits_processors.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T09:12:46.9165956Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/protocol.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T09:12:46.9167277Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/run_batch.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T09:12:46.9168628Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_chat.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T09:12:46.9170040Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_classification.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T09:12:46.9171501Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_completion.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T09:12:46.9173177Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_embedding.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T09:12:46.9174614Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_engine.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T09:12:46.9176082Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_models.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T09:12:46.9177519Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_pooling.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T09:12:46.9178989Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_responses.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T09:12:46.9180433Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_score.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T09:12:46.9181899Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_tokenization.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T09:12:46.9183393Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/serving_transcription.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T09:12:46.9184965Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/speech_to_text.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai 2025-09-07T09:12:46.9185999Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9187145Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9188797Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/abstract_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9190491Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/deepseekv31_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9192358Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/deepseekv3_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9194274Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/glm4_moe_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9196014Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/granite_20b_fc_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9197838Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/granite_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9199575Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9201310Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/hunyuan_a13b_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9815568Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/internlm2_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9817340Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/jamba_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9819133Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/kimi_k2_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9820899Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/llama4_pythonic_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9822810Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/llama_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9824695Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/minimax_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9826378Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9828046Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/openai_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9829711Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/phi4mini_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9831391Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9833088Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/qwen3coder_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9834813Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/seed_oss_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9836471Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/step3_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9838069Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9839651Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/entrypoints/openai/tool_parsers/xlam_tool_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/entrypoints/openai/tool_parsers 2025-09-07T09:12:46.9840741Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/executor 2025-09-07T09:12:46.9841532Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/executor/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/executor 2025-09-07T09:12:46.9842720Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/executor/executor_base.py -> build/bdist.linux-x86_64/wheel/./vllm/executor 2025-09-07T09:12:46.9843929Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/executor/mp_distributed_executor.py -> build/bdist.linux-x86_64/wheel/./vllm/executor 2025-09-07T09:12:46.9845118Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/executor/msgspec_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/executor 2025-09-07T09:12:46.9846319Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/executor/multiproc_worker_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/executor 2025-09-07T09:12:46.9847575Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/executor/ray_distributed_executor.py -> build/bdist.linux-x86_64/wheel/./vllm/executor 2025-09-07T09:12:46.9848742Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/executor/ray_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/executor 2025-09-07T09:12:46.9849893Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/executor/uniproc_executor.py -> build/bdist.linux-x86_64/wheel/./vllm/executor 2025-09-07T09:12:46.9850732Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/inputs 2025-09-07T09:12:46.9851505Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/inputs/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/inputs 2025-09-07T09:12:46.9852823Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/inputs/data.py -> build/bdist.linux-x86_64/wheel/./vllm/inputs 2025-09-07T09:12:46.9853888Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/inputs/parse.py -> build/bdist.linux-x86_64/wheel/./vllm/inputs 2025-09-07T09:12:46.9855050Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/inputs/preprocess.py -> build/bdist.linux-x86_64/wheel/./vllm/inputs 2025-09-07T09:12:46.9856168Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/inputs/registry.py -> build/bdist.linux-x86_64/wheel/./vllm/inputs 2025-09-07T09:12:46.9857027Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/logging_utils 2025-09-07T09:12:46.9857921Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/logging_utils/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/logging_utils 2025-09-07T09:12:46.9859130Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/logging_utils/dump_input.py -> build/bdist.linux-x86_64/wheel/./vllm/logging_utils 2025-09-07T09:12:46.9860364Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/logging_utils/formatter.py -> build/bdist.linux-x86_64/wheel/./vllm/logging_utils 2025-09-07T09:12:46.9861233Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/lora 2025-09-07T09:12:46.9861987Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T09:12:46.9863085Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/fully_sharded_layers.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T09:12:46.9864213Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/layers.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T09:12:46.9865336Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/lora.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T09:12:46.9866332Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/models.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T09:12:46.9867353Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/peft_helper.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T09:12:46.9868389Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/request.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T09:12:46.9869406Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/resolver.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T09:12:46.9870423Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T09:12:46.9871493Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/worker_manager.py -> build/bdist.linux-x86_64/wheel/./vllm/lora 2025-09-07T09:12:46.9872286Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/lora/ops 2025-09-07T09:12:46.9873078Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops 2025-09-07T09:12:46.9873903Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/lora/ops/ipex_ops 2025-09-07T09:12:46.9874826Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/ipex_ops/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/ipex_ops 2025-09-07T09:12:46.9876073Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/ipex_ops/lora_ops.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/ipex_ops 2025-09-07T09:12:46.9876991Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/lora/ops/torch_ops 2025-09-07T09:12:46.9877935Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/torch_ops/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/torch_ops 2025-09-07T09:12:46.9879186Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/torch_ops/lora_ops.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/torch_ops 2025-09-07T09:12:46.9880164Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/lora/ops/triton_ops 2025-09-07T09:12:46.9881124Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/triton_ops 2025-09-07T09:12:46.9882434Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops/kernel_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/triton_ops 2025-09-07T09:12:46.9883824Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops/lora_expand_op.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/triton_ops 2025-09-07T09:12:46.9885258Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops/lora_kernel_metadata.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/triton_ops 2025-09-07T09:12:46.9886636Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops/lora_shrink_op.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/triton_ops 2025-09-07T09:12:46.9887963Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/triton_ops/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/triton_ops 2025-09-07T09:12:46.9888893Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/lora/ops/xla_ops 2025-09-07T09:12:46.9889803Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/xla_ops/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/xla_ops 2025-09-07T09:12:46.9891017Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/ops/xla_ops/lora_ops.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/ops/xla_ops 2025-09-07T09:12:46.9892086Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/lora/punica_wrapper 2025-09-07T09:12:46.9893388Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/punica_wrapper 2025-09-07T09:12:46.9894749Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper/punica_base.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/punica_wrapper 2025-09-07T09:12:46.9896118Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper/punica_cpu.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/punica_wrapper 2025-09-07T09:12:46.9897484Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper/punica_gpu.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/punica_wrapper 2025-09-07T09:12:46.9898871Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper/punica_selector.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/punica_wrapper 2025-09-07T09:12:46.9900314Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper/punica_tpu.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/punica_wrapper 2025-09-07T09:12:46.9901689Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper/punica_xpu.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/punica_wrapper 2025-09-07T09:12:46.9903024Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/lora/punica_wrapper/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/lora/punica_wrapper 2025-09-07T09:12:46.9903986Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/model_executor 2025-09-07T09:12:46.9905001Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor 2025-09-07T09:12:46.9906187Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/custom_op.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor 2025-09-07T09:12:46.9907410Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/parameter.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor 2025-09-07T09:12:46.9908660Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/sampling_metadata.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor 2025-09-07T09:12:46.9909893Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor 2025-09-07T09:12:46.9910840Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers 2025-09-07T09:12:46.9911816Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T09:12:46.9913207Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/activation.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T09:12:46.9914625Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/attention_layer_base.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T09:12:46.9916035Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/layernorm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T09:12:46.9917415Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/lightning_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T09:12:46.9918769Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/linear.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T09:12:46.9920155Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/logits_processor.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T09:12:46.9921511Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mla.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T09:12:46.9922845Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/pooler.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T09:12:46.9924189Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/resampler.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T09:12:46.9925536Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/sampler.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T09:12:46.9926851Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T09:12:46.9928267Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/vocab_parallel_embedding.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers 2025-09-07T09:12:46.9929384Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9930511Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9932083Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/batched_deep_gemm_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9934035Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/batched_triton_or_deep_gemm_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9935704Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/config.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9937296Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/cpu_fused_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9938896Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/cutlass_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9940495Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/deep_gemm_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9942147Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/deep_gemm_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9943871Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/deepep_ht_prepare_finalize.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9945707Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/deepep_ll_prepare_finalize.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9947387Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/flashinfer_cutlass_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9949101Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9950801Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/fused_batched_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9952404Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/fused_marlin_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9953992Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/fused_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9955585Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/gpt_oss_triton_kernels_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9957168Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/layer.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9958696Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/modular_kernel.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9960304Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/moe_align_block_size.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9961936Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/moe_pallas.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9963515Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/moe_permute_unpermute.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9965155Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/moe_torch_iterative.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9966791Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/pplx_prepare_finalize.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9968414Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/prepare_finalize.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9970040Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/rocm_aiter_fused_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9971671Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/routing_simulator.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9973602Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/topk_weight_and_reduce.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9975338Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/triton_deep_gemm_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9976985Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/trtllm_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9978535Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe 2025-09-07T09:12:46.9979728Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:46.9981237Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:46.9983413Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:46.9985640Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:46.9987753Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:46.9989771Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:46.9991810Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:46.9994258Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:46.9996356Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:46.9998472Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0000574Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=1024,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0002668Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=1024,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0004843Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0006815Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0008792Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H20-3e.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0010667Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H20.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0012809Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0014858Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=352,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0017116Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0019372Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0021605Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0023708Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20-3e.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0025758Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0027781Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0029806Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0031717Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=512,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0033711Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=704,device_name=NVIDIA_B200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0035751Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=704,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0037943Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0040169Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0042318Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0044463Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0046471Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0048520Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0050545Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0052472Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=128,N=96,device_name=NVIDIA_H20.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0054603Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0056681Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_B200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0058679Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_B200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0060623Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_H100.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0062607Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-40GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0064745Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0066711Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0068785Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0070881Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0072913Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0074949Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0080070Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0082059Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0084158Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0086270Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0088384Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=3200,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0090474Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0092964Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0095054Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=6400,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0097223Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0099330Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0101419Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0103560Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=16,N=800,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0105730Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=160,N=192,device_name=NVIDIA_A800-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0107805Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=160,N=192,device_name=NVIDIA_H20-3e.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0109702Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=160,N=320,device_name=NVIDIA_H20-3e.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0111727Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0113990Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0116184Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0118339Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0120496Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0122627Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0124826Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0126974Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=1024,device_name=AMD_Instinct_MI325X,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0129168Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=1024,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0131420Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0133871Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0136124Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0138400Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0140645Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0142897Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0145259Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0147473Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0149710Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0151950Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0154138Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0156316Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0158461Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0160612Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0162789Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0164934Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0167129Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0169255Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=256,N=64,device_name=NVIDIA_A800-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0171315Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0173768Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0176026Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0178157Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=60,N=1408,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0180219Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=60,N=176,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0182234Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=60,N=352,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0184238Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=60,N=704,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0186316Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=62,N=256,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0188305Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=62,N=512,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0190259Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0192362Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A800-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0194621Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0196697Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0198739Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0200743Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0202776Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=1536,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0204974Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0207009Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0208997Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0210965Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=3072,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0213166Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=3072,device_name=NVIDIA_H20.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0215199Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0217275Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0219355Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0221354Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0223326Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0225399Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=384,device_name=NVIDIA_H20.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0227311Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0229260Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A800-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0231280Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0233384Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0235387Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0237349Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0239322Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0241247Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0243189Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=NVIDIA_H20.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0245044Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=64,N=896,device_name=NVIDIA_H20.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0246945Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=72,N=384,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0248880Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=72,N=768,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0250925Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0253221Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0255303Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0257399Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI325X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0259494Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0261611Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0263623Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0265787Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0267813Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0269820Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0271876Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI325X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0273891Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0275916Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0277931Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0279950Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI325X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0281900Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-40GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0283907Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0285865Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0287841Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0289769Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0291742Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0294185Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0296254Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0298395Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI325X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0300420Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0302488Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0304616Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0306869Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0308937Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0310876Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0312862Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0314915Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0316914Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0318923Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI325X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0320882Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-40GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0322841Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0324875Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0326967Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0329000Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0330980Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0333189Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0335161Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_L40S.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0337201Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0339328Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0341412Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0343471Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI325X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0345592Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0347642Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0349651Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0351648Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0353587Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0355556Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0357589Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0359605Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0361637Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI325X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0363601Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0365622Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0367676Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0369661Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0371635Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H200.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0373892Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0375960Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI300X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0378095Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0380174Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI325X.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0382255Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0384364Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H200,dtype=fp8_w8a8.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0386560Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0388622Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0390433Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs/README -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/fused_moe/configs 2025-09-07T09:12:47.0391651Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/mamba 2025-09-07T09:12:47.0393014Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba 2025-09-07T09:12:47.0394526Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/abstract.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba 2025-09-07T09:12:47.0396068Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/linear_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba 2025-09-07T09:12:47.0397607Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/mamba2_metadata.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba 2025-09-07T09:12:47.0399229Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/mamba_mixer.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba 2025-09-07T09:12:47.0400768Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/mamba_mixer2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba 2025-09-07T09:12:47.0402297Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/mamba_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba 2025-09-07T09:12:47.0403868Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/short_conv.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba 2025-09-07T09:12:47.0405116Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/mamba/ops 2025-09-07T09:12:47.0406225Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba/ops 2025-09-07T09:12:47.0407793Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops/causal_conv1d.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba/ops 2025-09-07T09:12:47.0409414Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops/layernorm_gated.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba/ops 2025-09-07T09:12:47.0411266Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops/mamba_ssm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba/ops 2025-09-07T09:12:47.0413063Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops/ssd_bmm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba/ops 2025-09-07T09:12:47.0414662Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops/ssd_chunk_scan.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba/ops 2025-09-07T09:12:47.0416303Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops/ssd_chunk_state.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba/ops 2025-09-07T09:12:47.0417939Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops/ssd_combined.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba/ops 2025-09-07T09:12:47.0419582Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/mamba/ops/ssd_state_passing.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/mamba/ops 2025-09-07T09:12:47.0420807Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0422017Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0423674Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/auto_round.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0425459Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/awq.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0427059Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/awq_marlin.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0428675Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/awq_triton.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0430318Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/base_config.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0432006Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/bitblas.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0433645Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/bitsandbytes.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0435349Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/deepgemm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0437005Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/deepspeedfp.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0438666Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/experts_int8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0440314Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/fbgemm_fp8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0441910Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/fp8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0443503Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/gguf.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0445084Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/gptq.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0446704Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/gptq_bitblas.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0448336Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/gptq_marlin.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0449994Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/gptq_marlin_24.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0451625Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/hqq_marlin.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0453477Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/inc.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0455164Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/input_quant_fp8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0456892Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/ipex_quant.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0458562Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kv_cache.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0460234Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/modelopt.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0461886Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/moe_wna16.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0463570Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/mxfp4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0465297Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/petit.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0466913Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/ptpc_fp8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0468504Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/rtn.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0470089Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/schema.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0471692Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/torchao.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0473311Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/tpu_int8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization 2025-09-07T09:12:47.0474616Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T09:12:47.0476071Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T09:12:47.0478109Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T09:12:47.0480248Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T09:12:47.0482332Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/triton_scaled_mm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T09:12:47.0484350Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors 2025-09-07T09:12:47.0485850Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T09:12:47.0487429Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T09:12:47.0489675Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_24.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T09:12:47.0492144Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_scheme.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T09:12:47.0494774Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_24.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T09:12:47.0497283Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_nvfp4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T09:12:47.0499778Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a4_nvfp4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T09:12:47.0502210Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a8_fp8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T09:12:47.0504721Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a8_int.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T09:12:47.0507081Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a16_fp8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T09:12:47.0509482Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_fp8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T09:12:47.0511814Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T09:12:47.0514153Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_wNa16.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/schemes 2025-09-07T09:12:47.0515838Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/compressed_tensors/transform 2025-09-07T09:12:47.0517455Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform/linear.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/transform 2025-09-07T09:12:47.0519669Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform/module.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/transform 2025-09-07T09:12:47.0521860Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/transform 2025-09-07T09:12:47.0523548Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/compressed_tensors/transform/schemes 2025-09-07T09:12:47.0525374Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/compressed_tensors/transform/schemes/linear_qutlass_nvfp4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/compressed_tensors/transform/schemes 2025-09-07T09:12:47.0527049Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/kernels 2025-09-07T09:12:47.0528329Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels 2025-09-07T09:12:47.0529743Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T09:12:47.0531347Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/MPLinearKernel.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T09:12:47.0533757Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T09:12:47.0535917Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/allspark.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T09:12:47.0538095Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/bitblas.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T09:12:47.0540242Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/conch.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T09:12:47.0542444Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/cutlass.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T09:12:47.0544742Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/dynamic_4bit.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T09:12:47.0546856Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/exllama.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T09:12:47.0548946Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/machete.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T09:12:47.0551057Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/mixed_precision/marlin.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/mixed_precision 2025-09-07T09:12:47.0552569Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T09:12:47.0554096Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm/ScaledMMLinearKernel.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T09:12:47.0556143Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T09:12:47.0558141Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm/aiter.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T09:12:47.0560075Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm/cpu.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T09:12:47.0562025Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm/cutlass.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T09:12:47.0564016Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm/triton.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T09:12:47.0565955Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/kernels/scaled_mm/xla.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/kernels/scaled_mm 2025-09-07T09:12:47.0567356Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/quark 2025-09-07T09:12:47.0568628Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/quark 2025-09-07T09:12:47.0570336Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/quark.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/quark 2025-09-07T09:12:47.0572067Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/quark_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/quark 2025-09-07T09:12:47.0574065Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/quark 2025-09-07T09:12:47.0575437Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T09:12:47.0576867Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T09:12:47.0578826Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes/quark_scheme.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T09:12:47.0580860Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes/quark_w4a4_mxfp4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T09:12:47.0582888Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_fp8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T09:12:47.0585007Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_int8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/quark/schemes 2025-09-07T09:12:47.0586378Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0587647Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0589508Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/allspark_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0591327Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/bitblas_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0593490Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0595388Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/flashinfer_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0597301Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/fp8_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0599122Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/gptq_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0600973Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/int8_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0602800Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/layer_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0604755Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/machete_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0606548Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/marlin_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0608396Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/marlin_utils_fp4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0610217Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/marlin_utils_fp8.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0612024Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/marlin_utils_test.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0614149Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/marlin_utils_test_24.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0616028Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/mxfp4_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0617856Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/mxfp8_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0619743Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/nvfp4_emulation_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0621662Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/nvfp4_moe_support.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0623550Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/petit_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0625456Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/quant_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0627225Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/w8a8_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils 2025-09-07T09:12:47.0628536Z #34 8312.2 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0630291Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=12288,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0632762Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=12288,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0635228Z #34 8312.2 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0637694Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0640172Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0642684Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0645171Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0647655Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0650076Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0652520Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0655201Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0657789Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0660344Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0662936Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0665600Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0668098Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0670525Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0672914Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0675352Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0677748Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0680167Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0682614Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0685079Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0687557Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0690089Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0692886Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0695444Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0697905Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0700413Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0702899Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2112,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0705497Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2112,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0707988Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0710453Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0712928Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0715424Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0717923Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0720396Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0722868Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0725267Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0727663Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0730120Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=1536,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0732832Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=1536,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0735350Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0737892Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0740480Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0743060Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0745737Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0748191Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0750638Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0753078Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0755509Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0757907Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0760301Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0762713Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0765162Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0767609Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0770079Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0772819Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0775371Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0777908Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0780392Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0782843Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0785422Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0787906Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0790377Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0793128Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0795644Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0798171Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0800626Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0803117Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0805762Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0808239Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0810675Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0813338Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0815845Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0818312Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0820823Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0823323Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0825952Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0828463Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0830993Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0833467Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0835901Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0838357Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0840771Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0843159Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0845557Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0847946Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0850318Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0853012Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0855571Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0858129Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0860723Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0863267Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0865844Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0868288Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0870775Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0873194Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0875611Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0878011Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0880391Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0882760Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0885187Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0887645Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4096,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0890085Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0893006Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0895621Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0898138Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0900643Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0903148Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0905732Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0908123Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0910557Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0913006Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0915586Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0918024Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0920467Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0922849Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0925275Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0927698Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0943153Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0945821Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0948305Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0950809Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0953227Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0955625Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0958047Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0960429Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0962800Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0965216Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0967591Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0970019Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0972789Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0975399Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0977964Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0980541Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0983115Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0985750Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0988063Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0990385Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0993125Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0995667Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.0998294Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1000841Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1003402Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1006076Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1008541Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1010914Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1013578Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1016141Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1018669Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1021191Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1023751Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1026320Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1028705Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1031068Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1033374Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1035729Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1038148Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1040597Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1043021Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1045157Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/README.md -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1047325Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1049702Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1052078Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1054768Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1057246Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1059723Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1062245Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1064704Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1067200Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1069635Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1072067Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1074477Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1076904Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1079313Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1081672Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1084028Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1086357Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1088685Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1091000Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1093950Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1096465Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1099003Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1101596Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1104166Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1106832Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1109183Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1111504Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1113931Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1116276Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1118661Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1121064Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1123427Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1125789Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1128175Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1130489Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1133075Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1135566Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1138134Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1140669Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1143175Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1145710Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1148013Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1150315Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1152664Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1155043Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1157445Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1159872Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1162260Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1164691Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/quantization/utils/configs 2025-09-07T09:12:47.1166282Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/rotary_embedding 2025-09-07T09:12:47.1167470Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T09:12:47.1169078Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/base.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T09:12:47.1170672Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/common.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T09:12:47.1172393Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/deepseek_scaling_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T09:12:47.1174346Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/dual_chunk_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T09:12:47.1176212Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/dynamic_ntk_alpha_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T09:12:47.1178068Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/dynamic_ntk_scaling_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T09:12:47.1179876Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/ernie45_vl_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T09:12:47.1181673Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/linear_scaling_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T09:12:47.1183453Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/llama3_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T09:12:47.1185315Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/llama4_vision_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T09:12:47.1186965Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/mrope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T09:12:47.1188616Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/ntk_scaling_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T09:12:47.1190349Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/phi3_long_rope_scaled_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T09:12:47.1192374Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/rotary_embedding/yarn_scaling_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/rotary_embedding 2025-09-07T09:12:47.1193836Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/layers/shared_fused_moe 2025-09-07T09:12:47.1195071Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/shared_fused_moe/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/shared_fused_moe 2025-09-07T09:12:47.1196839Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/shared_fused_moe/shared_fused_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/layers/shared_fused_moe 2025-09-07T09:12:47.1198071Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/model_loader 2025-09-07T09:12:47.1199168Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T09:12:47.1200708Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/base_loader.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T09:12:47.1202259Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/bitsandbytes_loader.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T09:12:47.1203848Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/default_loader.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T09:12:47.1205576Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/dummy_loader.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T09:12:47.1206984Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/gguf_loader.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T09:12:47.1208497Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/runai_streamer_loader.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T09:12:47.1210018Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/sharded_state_loader.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T09:12:47.1211477Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/tensorizer.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T09:12:47.1213174Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/tensorizer_loader.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T09:12:47.1214682Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/tpu.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T09:12:47.1216127Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T09:12:47.1217618Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/model_loader/weight_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/model_loader 2025-09-07T09:12:47.1218698Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/models 2025-09-07T09:12:47.1219712Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1221142Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/adapters.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1222502Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/aimv2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1223883Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/apertus.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1225343Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/arcee.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1226647Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/arctic.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1227918Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/aria.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1229195Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/aya_vision.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1230534Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/baichuan.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1231859Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/bailing_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1233145Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/bamba.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1234416Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/bart.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1235681Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/bert.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1236979Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/bert_with_rope.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1238321Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/blip.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1239591Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/blip2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1240848Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/bloom.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1242321Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/chameleon.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1243661Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/chatglm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1245217Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/clip.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1246576Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/cohere2_vision.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1247946Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/commandr.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1249286Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/config.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1250705Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/constant_size_cache.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1252064Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/dbrx.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1253677Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/deepseek.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1255104Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/deepseek_eagle.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1256532Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/deepseek_mtp.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1257993Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/deepseek_v2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1259398Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/deepseek_vl2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1260820Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/donut.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1262181Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/dots1.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1263530Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/ernie45.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1265135Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/ernie45_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1266459Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/ernie45_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1267769Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/ernie45_vl_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1269125Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/ernie_mtp.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1270423Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/exaone.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1271699Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/exaone4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1273035Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/fairseq2_llama.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1274344Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/falcon.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1275644Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/falcon_h1.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1276960Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/florence2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1278233Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/fuyu.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1279503Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gemma.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1280806Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gemma2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1282070Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gemma3.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1283369Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gemma3_mm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1284664Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gemma3n.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1285954Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gemma3n_mm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1287268Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/glm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1288516Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/glm4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1289801Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/glm4_1v.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1291125Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/glm4_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1292990Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/glm4_moe_mtp.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1294371Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/glm4v.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1295733Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gpt2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1297117Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gpt_bigcode.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1298541Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gpt_j.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1299900Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gpt_neox.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1301269Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gpt_oss.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1302637Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/granite.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1304057Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/granite_speech.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1305680Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/granitemoe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1307052Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/granitemoehybrid.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1308463Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/granitemoeshared.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1309808Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/gritlm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1311133Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/grok1.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1312386Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/h2ovl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1313680Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/hunyuan_v1.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1315041Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/hyperclovax_vision.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1316453Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/idefics2_vision_model.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1317859Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/idefics3.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1319179Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/interfaces.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1320556Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/interfaces_base.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1321896Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/intern_vit.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1323193Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/internlm2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1324525Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/internlm2_ve.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1325845Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/interns1.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1327154Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/interns1_vit.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1328509Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/internvl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1329802Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/jais.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1331053Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/jamba.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1332402Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/jina_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1333914Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/keye.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1335260Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/keye_vl1_5.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1336630Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/kimi_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1337965Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/lfm2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1339318Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/llama.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1340713Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/llama4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1342088Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/llama4_eagle.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1343498Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/llama_eagle.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1345006Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/llama_eagle3.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1346404Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/llava.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1347727Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/llava_next.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1349067Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/llava_next_video.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1350427Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/llava_onevision.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1351776Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mamba.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1353040Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mamba2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1354339Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mamba_cache.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1355633Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/medusa.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1356925Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/midashenglm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1358245Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mimo.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1359511Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mimo_mtp.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1360787Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/minicpm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1362091Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/minicpm3.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1363416Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/minicpm_eagle.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1364729Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/minicpmo.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1366031Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/minicpmv.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1367345Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/minimax_cache.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1368689Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/minimax_text_01.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1370055Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/minimax_vl_01.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1371365Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mistral3.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1372908Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mixtral.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1374318Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mixtral_quant.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1375707Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mllama.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1377114Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mllama4.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1378524Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mlp_speculator.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1379937Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/modernbert.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1381393Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/module_mapping.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1382785Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/molmo.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1384154Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/moonvit.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1385578Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/mpt.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1386847Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/nemotron.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1388185Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/nemotron_h.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1389500Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/nemotron_nas.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1390821Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/nemotron_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1392229Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/nvlm_d.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1393731Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/olmo.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1395072Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/olmo2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1396424Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/olmoe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1397753Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/opt.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1399092Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/orion.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1400432Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/ovis.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1401834Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/ovis2_5.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1403224Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/paligemma.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1404725Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/persimmon.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1406110Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/phi.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1407421Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/phi3.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1408685Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/phi3v.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1409995Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/phi4_multimodal.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1411384Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/phi4flash.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1412896Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/phi4mm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1414289Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/phi4mm_audio.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1415715Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/phi4mm_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1417093Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/phimoe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1418505Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/pixtral.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1419877Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/plamo2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1421210Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1422554Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1423969Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen2_5_omni_thinker.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1425561Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen2_5_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1426870Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen2_audio.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1428171Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen2_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1429456Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen2_rm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1430735Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen2_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1432031Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen3.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1433313Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen3_moe.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1434596Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/qwen_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1435873Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/registry.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1437193Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/roberta.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1438464Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/rvl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1439728Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/seed_oss.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1441041Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/siglip.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1442354Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/siglip2navit.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1443693Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/skyworkr1v.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1445014Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/smolvlm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1446284Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/solar.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1447603Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/stablelm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1448924Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/starcoder2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1450228Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/step3_text.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1451537Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/step3_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1453058Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/swin.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1454405Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/tarsier.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1455803Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/telechat2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1457173Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/teleflm.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1458563Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/terratorch.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1459994Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/transformers.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1461434Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/ultravox.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1462811Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1464173Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/vision.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1465599Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/voxtral.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1466919Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/whisper.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1468204Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/models/zamba2.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/models 2025-09-07T09:12:47.1469137Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/model_executor/warmup 2025-09-07T09:12:47.1470089Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/warmup/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/warmup 2025-09-07T09:12:47.1471438Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/warmup/deep_gemm_warmup.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/warmup 2025-09-07T09:12:47.1472791Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/model_executor/warmup/kernel_warmup.py -> build/bdist.linux-x86_64/wheel/./vllm/model_executor/warmup 2025-09-07T09:12:47.1473727Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/multimodal 2025-09-07T09:12:47.1474511Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T09:12:47.1475586Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/audio.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T09:12:47.1476642Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/base.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T09:12:47.1477730Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/cache.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T09:12:47.1478813Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/hasher.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T09:12:47.1479882Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/image.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T09:12:47.1480961Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/inputs.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T09:12:47.1482041Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/parse.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T09:12:47.1483142Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/processing.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T09:12:47.1484275Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/profiling.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T09:12:47.1485379Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/registry.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T09:12:47.1486469Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T09:12:47.1487538Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/multimodal/video.py -> build/bdist.linux-x86_64/wheel/./vllm/multimodal 2025-09-07T09:12:47.1488316Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/platforms 2025-09-07T09:12:47.1489127Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/platforms/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/platforms 2025-09-07T09:12:47.1490152Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/platforms/cpu.py -> build/bdist.linux-x86_64/wheel/./vllm/platforms 2025-09-07T09:12:47.1491196Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/platforms/cuda.py -> build/bdist.linux-x86_64/wheel/./vllm/platforms 2025-09-07T09:12:47.1492802Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/platforms/interface.py -> build/bdist.linux-x86_64/wheel/./vllm/platforms 2025-09-07T09:12:47.1493935Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/platforms/rocm.py -> build/bdist.linux-x86_64/wheel/./vllm/platforms 2025-09-07T09:12:47.1495107Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/platforms/tpu.py -> build/bdist.linux-x86_64/wheel/./vllm/platforms 2025-09-07T09:12:47.1496207Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/platforms/xpu.py -> build/bdist.linux-x86_64/wheel/./vllm/platforms 2025-09-07T09:12:47.1497004Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/plugins 2025-09-07T09:12:47.1497810Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/plugins/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/plugins 2025-09-07T09:12:47.1498713Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/plugins/io_processors 2025-09-07T09:12:47.1499737Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/plugins/io_processors/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/plugins/io_processors 2025-09-07T09:12:47.1501126Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/plugins/io_processors/interface.py -> build/bdist.linux-x86_64/wheel/./vllm/plugins/io_processors 2025-09-07T09:12:47.1502150Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/plugins/lora_resolvers 2025-09-07T09:12:47.1503182Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/plugins/lora_resolvers/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/plugins/lora_resolvers 2025-09-07T09:12:47.1504746Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/plugins/lora_resolvers/filesystem_resolver.py -> build/bdist.linux-x86_64/wheel/./vllm/plugins/lora_resolvers 2025-09-07T09:12:47.1506290Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/plugins/lora_resolvers/README.md -> build/bdist.linux-x86_64/wheel/./vllm/plugins/lora_resolvers 2025-09-07T09:12:47.1507197Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/profiler 2025-09-07T09:12:47.1507956Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/profiler/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/profiler 2025-09-07T09:12:47.1509056Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/profiler/layerwise_profile.py -> build/bdist.linux-x86_64/wheel/./vllm/profiler 2025-09-07T09:12:47.1510160Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/profiler/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/profiler 2025-09-07T09:12:47.1510904Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/ray 2025-09-07T09:12:47.1511611Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/ray/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/ray 2025-09-07T09:12:47.1512562Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/ray/lazy_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/ray 2025-09-07T09:12:47.1513533Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/ray/ray_env.py -> build/bdist.linux-x86_64/wheel/./vllm/ray 2025-09-07T09:12:47.1514257Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/reasoning 2025-09-07T09:12:47.1515026Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T09:12:47.1516157Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/abs_reasoning_parsers.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T09:12:47.1517448Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/deepseek_r1_reasoning_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T09:12:47.1518706Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/glm4_moe_reasoning_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T09:12:47.1519946Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/gptoss_reasoning_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T09:12:47.1521178Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/granite_reasoning_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T09:12:47.1522441Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/hunyuan_a13b_reasoning_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T09:12:47.1523731Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/mistral_reasoning_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T09:12:47.1524946Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/qwen3_reasoning_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T09:12:47.1526167Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/reasoning/step3_reasoning_parser.py -> build/bdist.linux-x86_64/wheel/./vllm/reasoning 2025-09-07T09:12:47.1527056Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/third_party 2025-09-07T09:12:47.1527853Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/third_party/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/third_party 2025-09-07T09:12:47.1528936Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/third_party/pynvml.py -> build/bdist.linux-x86_64/wheel/./vllm/third_party 2025-09-07T09:12:47.1529772Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/transformers_utils 2025-09-07T09:12:47.1530682Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T09:12:47.1531912Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/config.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T09:12:47.1533488Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/detokenizer.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T09:12:47.1534905Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/detokenizer_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T09:12:47.1536308Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/dynamic_module.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T09:12:47.1537684Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processor.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T09:12:47.1539017Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/s3_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T09:12:47.1540335Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/tokenizer.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T09:12:47.1541706Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/tokenizer_base.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T09:12:47.1543106Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/tokenizer_group.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T09:12:47.1544446Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils 2025-09-07T09:12:47.1545649Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/transformers_utils/chat_templates 2025-09-07T09:12:47.1546785Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/chat_templates 2025-09-07T09:12:47.1548309Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates/registry.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/chat_templates 2025-09-07T09:12:47.1549890Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates/template_basic.jinja -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/chat_templates 2025-09-07T09:12:47.1551511Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates/template_blip2.jinja -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/chat_templates 2025-09-07T09:12:47.1553173Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates/template_chatml.jinja -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/chat_templates 2025-09-07T09:12:47.1554845Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates/template_deepseek_vl2.jinja -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/chat_templates 2025-09-07T09:12:47.1556521Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates/template_fuyu.jinja -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/chat_templates 2025-09-07T09:12:47.1558177Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/chat_templates/template_minicpmv45.jinja -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/chat_templates 2025-09-07T09:12:47.1559350Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/transformers_utils/configs 2025-09-07T09:12:47.1560372Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1561766Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/arctic.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1563167Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/chatglm.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1564608Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/deepseek_vl2.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1566021Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/eagle.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1567394Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/falcon.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1568778Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/jais.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1570154Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/kimi_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1571535Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/medusa.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1573266Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/midashenglm.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1574828Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/mistral.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1576349Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/mlp_speculator.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1577924Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/moonvit.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1579425Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/nemotron.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1580925Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/nemotron_h.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1582447Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/nemotron_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1583967Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/ovis.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1585498Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/step3_vl.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1586899Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/ultravox.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs 2025-09-07T09:12:47.1588086Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/transformers_utils/configs/speculators 2025-09-07T09:12:47.1589294Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/speculators/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs/speculators 2025-09-07T09:12:47.1590937Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/speculators/algos.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs/speculators 2025-09-07T09:12:47.1593049Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/configs/speculators/base.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/configs/speculators 2025-09-07T09:12:47.1594274Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/transformers_utils/processors 2025-09-07T09:12:47.1595476Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processors/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/processors 2025-09-07T09:12:47.1597043Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processors/deepseek_vl2.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/processors 2025-09-07T09:12:47.1598613Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processors/ovis.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/processors 2025-09-07T09:12:47.1600147Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/processors/ovis2_5.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/processors 2025-09-07T09:12:47.1601267Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/transformers_utils/tokenizers 2025-09-07T09:12:47.1602393Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/tokenizers/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/tokenizers 2025-09-07T09:12:47.1603941Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/transformers_utils/tokenizers/mistral.py -> build/bdist.linux-x86_64/wheel/./vllm/transformers_utils/tokenizers 2025-09-07T09:12:47.1605337Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/triton_utils 2025-09-07T09:12:47.1606155Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/triton_utils/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/triton_utils 2025-09-07T09:12:47.1607271Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/triton_utils/importing.py -> build/bdist.linux-x86_64/wheel/./vllm/triton_utils 2025-09-07T09:12:47.1608139Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/usage 2025-09-07T09:12:47.1608854Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/usage/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/usage 2025-09-07T09:12:47.1609851Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/usage/usage_lib.py -> build/bdist.linux-x86_64/wheel/./vllm/usage 2025-09-07T09:12:47.1610593Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/utils 2025-09-07T09:12:47.1611309Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/utils/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/utils 2025-09-07T09:12:47.1612382Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/utils/deep_gemm.py -> build/bdist.linux-x86_64/wheel/./vllm/utils 2025-09-07T09:12:47.1613664Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/utils/flashinfer.py -> build/bdist.linux-x86_64/wheel/./vllm/utils 2025-09-07T09:12:47.1614758Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/utils/jsontree.py -> build/bdist.linux-x86_64/wheel/./vllm/utils 2025-09-07T09:12:47.1615864Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/utils/tensor_schema.py -> build/bdist.linux-x86_64/wheel/./vllm/utils 2025-09-07T09:12:47.1616659Z #34 8312.3 creating build/bdist.linux-x86_64/wheel/vllm/v1 2025-09-07T09:12:47.1617433Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/v1/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1 2025-09-07T09:12:47.1618497Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/v1/cudagraph_dispatcher.py -> build/bdist.linux-x86_64/wheel/./vllm/v1 2025-09-07T09:12:47.1619623Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/v1/kv_cache_interface.py -> build/bdist.linux-x86_64/wheel/./vllm/v1 2025-09-07T09:12:47.1620677Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/v1/outputs.py -> build/bdist.linux-x86_64/wheel/./vllm/v1 2025-09-07T09:12:47.1621682Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/v1/request.py -> build/bdist.linux-x86_64/wheel/./vllm/v1 2025-09-07T09:12:47.1622720Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/v1/serial_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/v1 2025-09-07T09:12:47.1623745Z #34 8312.3 copying build/lib.linux-x86_64-cpython-312/vllm/v1/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/v1 2025-09-07T09:12:47.1624521Z #34 8312.4 creating build/bdist.linux-x86_64/wheel/vllm/v1/attention 2025-09-07T09:12:47.1625517Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention 2025-09-07T09:12:47.1626368Z #34 8312.4 creating build/bdist.linux-x86_64/wheel/vllm/v1/attention/backends 2025-09-07T09:12:47.1627333Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T09:12:47.1628655Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/cpu_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T09:12:47.1629995Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/flash_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T09:12:47.1631364Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/flashinfer.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T09:12:47.1632749Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/flex_attention.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T09:12:47.1634131Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/linear_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T09:12:47.1635500Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mamba1_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T09:12:47.1636885Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mamba2_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T09:12:47.1638243Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mamba_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T09:12:47.1639591Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/pallas.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T09:12:47.1640212Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/rocm_aiter_fa.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T09:12:47.1640846Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/short_conv_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T09:12:47.1641500Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/tree_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T09:12:47.1642121Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/triton_attn.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T09:12:47.1642715Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T09:12:47.1643361Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/xformers.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends 2025-09-07T09:12:47.1643622Z #34 8312.4 creating build/bdist.linux-x86_64/wheel/vllm/v1/attention/backends/mla 2025-09-07T09:12:47.1644253Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends/mla 2025-09-07T09:12:47.1644906Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla/common.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends/mla 2025-09-07T09:12:47.1645573Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla/cutlass_mla.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends/mla 2025-09-07T09:12:47.1646278Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla/flashattn_mla.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends/mla 2025-09-07T09:12:47.1646945Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla/flashmla.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends/mla 2025-09-07T09:12:47.1647615Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla/rocm_aiter_mla.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends/mla 2025-09-07T09:12:47.1648281Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/attention/backends/mla/triton_mla.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/attention/backends/mla 2025-09-07T09:12:47.1648466Z #34 8312.4 creating build/bdist.linux-x86_64/wheel/vllm/v1/core 2025-09-07T09:12:47.1648901Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core 2025-09-07T09:12:47.1649377Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/block_pool.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core 2025-09-07T09:12:47.1649892Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/encoder_cache_manager.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core 2025-09-07T09:12:47.1650397Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/kv_cache_coordinator.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core 2025-09-07T09:12:47.1650896Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/kv_cache_manager.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core 2025-09-07T09:12:47.1651361Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/kv_cache_utils.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core 2025-09-07T09:12:47.1651938Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/single_type_kv_cache_manager.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core 2025-09-07T09:12:47.1652158Z #34 8312.4 creating build/bdist.linux-x86_64/wheel/vllm/v1/core/sched 2025-09-07T09:12:47.1652909Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core/sched 2025-09-07T09:12:47.1653492Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched/async_scheduler.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core/sched 2025-09-07T09:12:47.1654058Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched/interface.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core/sched 2025-09-07T09:12:47.1654629Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched/output.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core/sched 2025-09-07T09:12:47.1655198Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched/request_queue.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core/sched 2025-09-07T09:12:47.1655766Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched/scheduler.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core/sched 2025-09-07T09:12:47.1656321Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/core/sched/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/core/sched 2025-09-07T09:12:47.1656528Z #34 8312.4 creating build/bdist.linux-x86_64/wheel/vllm/v1/engine 2025-09-07T09:12:47.1657034Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T09:12:47.1657539Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/async_llm.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T09:12:47.1658066Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/coordinator.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T09:12:47.1658559Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/core.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T09:12:47.1659072Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/core_client.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T09:12:47.1659626Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/detokenizer.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T09:12:47.1660157Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/exceptions.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T09:12:47.1660665Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/llm_engine.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T09:12:47.1661178Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/logprobs.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T09:12:47.1661738Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/output_processor.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T09:12:47.1662292Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/parallel_sampling.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T09:12:47.1662811Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/processor.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T09:12:47.1663304Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/engine/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/engine 2025-09-07T09:12:47.1663520Z #34 8312.4 creating build/bdist.linux-x86_64/wheel/vllm/v1/executor 2025-09-07T09:12:47.1664032Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/executor/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/executor 2025-09-07T09:12:47.1664569Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/executor/abstract.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/executor 2025-09-07T09:12:47.1665275Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/executor/multiproc_executor.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/executor 2025-09-07T09:12:47.1665850Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/executor/ray_distributed_executor.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/executor 2025-09-07T09:12:47.1666053Z #34 8312.4 creating build/bdist.linux-x86_64/wheel/vllm/v1/metrics 2025-09-07T09:12:47.1666520Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/metrics/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/metrics 2025-09-07T09:12:47.1667001Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/metrics/loggers.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/metrics 2025-09-07T09:12:47.1667537Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/metrics/prometheus.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/metrics 2025-09-07T09:12:47.1668036Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/metrics/ray_wrappers.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/metrics 2025-09-07T09:12:47.1668510Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/metrics/reader.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/metrics 2025-09-07T09:12:47.1669016Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/metrics/stats.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/metrics 2025-09-07T09:12:47.1669201Z #34 8312.4 creating build/bdist.linux-x86_64/wheel/vllm/v1/pool 2025-09-07T09:12:47.1669644Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/pool/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/pool 2025-09-07T09:12:47.1670111Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/pool/metadata.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/pool 2025-09-07T09:12:47.1670307Z #34 8312.4 creating build/bdist.linux-x86_64/wheel/vllm/v1/sample 2025-09-07T09:12:47.1670765Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample 2025-09-07T09:12:47.1671255Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/metadata.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample 2025-09-07T09:12:47.1671808Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/rejection_sampler.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample 2025-09-07T09:12:47.1672283Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/sampler.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample 2025-09-07T09:12:47.1672560Z #34 8312.4 creating build/bdist.linux-x86_64/wheel/vllm/v1/sample/logits_processor 2025-09-07T09:12:47.1673186Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/logits_processor/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/logits_processor 2025-09-07T09:12:47.1673824Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/logits_processor/builtin.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/logits_processor 2025-09-07T09:12:47.1674485Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/logits_processor/interface.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/logits_processor 2025-09-07T09:12:47.1675109Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/logits_processor/state.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/logits_processor 2025-09-07T09:12:47.1675324Z #34 8312.4 creating build/bdist.linux-x86_64/wheel/vllm/v1/sample/ops 2025-09-07T09:12:47.1675837Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/ops 2025-09-07T09:12:47.1676353Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops/bad_words.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/ops 2025-09-07T09:12:47.1676870Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops/logprobs.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/ops 2025-09-07T09:12:47.1677437Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops/penalties.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/ops 2025-09-07T09:12:47.1677993Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/ops/topk_topp_sampler.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/ops 2025-09-07T09:12:47.1678208Z #34 8312.4 creating build/bdist.linux-x86_64/wheel/vllm/v1/sample/tpu 2025-09-07T09:12:47.1678718Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/tpu/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/tpu 2025-09-07T09:12:47.1679230Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/tpu/metadata.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/tpu 2025-09-07T09:12:47.1679769Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/sample/tpu/sampler.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/sample/tpu 2025-09-07T09:12:47.1679998Z #34 8312.4 creating build/bdist.linux-x86_64/wheel/vllm/v1/spec_decode 2025-09-07T09:12:47.1680496Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/spec_decode 2025-09-07T09:12:47.1681036Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode/eagle.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/spec_decode 2025-09-07T09:12:47.1681553Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode/medusa.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/spec_decode 2025-09-07T09:12:47.1682071Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode/metadata.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/spec_decode 2025-09-07T09:12:47.1682577Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode/metrics.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/spec_decode 2025-09-07T09:12:47.1683135Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode/ngram_proposer.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/spec_decode 2025-09-07T09:12:47.1683626Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/spec_decode/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/spec_decode 2025-09-07T09:12:47.1683868Z #34 8312.4 creating build/bdist.linux-x86_64/wheel/vllm/v1/structured_output 2025-09-07T09:12:47.1684486Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/structured_output 2025-09-07T09:12:47.1685108Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output/backend_guidance.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/structured_output 2025-09-07T09:12:47.1685777Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output/backend_lm_format_enforcer.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/structured_output 2025-09-07T09:12:47.1686412Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output/backend_outlines.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/structured_output 2025-09-07T09:12:47.1687011Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output/backend_types.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/structured_output 2025-09-07T09:12:47.1687632Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output/backend_xgrammar.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/structured_output 2025-09-07T09:12:47.1688219Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output/request.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/structured_output 2025-09-07T09:12:47.2381066Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/structured_output/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/structured_output 2025-09-07T09:12:47.2381705Z #34 8312.4 creating build/bdist.linux-x86_64/wheel/vllm/v1/worker 2025-09-07T09:12:47.2383299Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T09:12:47.2384632Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/block_table.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T09:12:47.2386035Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/cpu_model_runner.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T09:12:47.2387064Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/cpu_worker.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T09:12:47.2387570Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/gpu_input_batch.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T09:12:47.2388162Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/gpu_model_runner.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T09:12:47.2388648Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/gpu_worker.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T09:12:47.2389244Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/kv_connector_model_runner_mixin.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T09:12:47.2389819Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/lora_model_runner_mixin.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T09:12:47.2390380Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/tpu_input_batch.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T09:12:47.2390896Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/tpu_model_runner.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T09:12:47.2391404Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/tpu_worker.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T09:12:47.2391876Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T09:12:47.2392576Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/worker_base.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T09:12:47.2393288Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/xpu_model_runner.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T09:12:47.2393862Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/v1/worker/xpu_worker.py -> build/bdist.linux-x86_64/wheel/./vllm/v1/worker 2025-09-07T09:12:47.2394056Z #34 8312.4 creating build/bdist.linux-x86_64/wheel/vllm/worker 2025-09-07T09:12:47.2394537Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/worker/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/worker 2025-09-07T09:12:47.2395034Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/worker/cache_engine.py -> build/bdist.linux-x86_64/wheel/./vllm/worker 2025-09-07T09:12:47.2395561Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/worker/enc_dec_model_runner.py -> build/bdist.linux-x86_64/wheel/./vllm/worker 2025-09-07T09:12:47.2396066Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/worker/model_runner.py -> build/bdist.linux-x86_64/wheel/./vllm/worker 2025-09-07T09:12:47.2396583Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/worker/model_runner_base.py -> build/bdist.linux-x86_64/wheel/./vllm/worker 2025-09-07T09:12:47.2397043Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/worker/utils.py -> build/bdist.linux-x86_64/wheel/./vllm/worker 2025-09-07T09:12:47.2397524Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/worker/worker.py -> build/bdist.linux-x86_64/wheel/./vllm/worker 2025-09-07T09:12:47.2398008Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/worker/worker_base.py -> build/bdist.linux-x86_64/wheel/./vllm/worker 2025-09-07T09:12:47.2398407Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/py.typed -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:47.2398697Z #34 8312.4 creating build/bdist.linux-x86_64/wheel/vllm/vllm_flash_attn 2025-09-07T09:12:47.2399237Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/.gitkeep -> build/bdist.linux-x86_64/wheel/./vllm/vllm_flash_attn 2025-09-07T09:12:47.2399819Z #34 8312.4 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/_vllm_fa2_C.abi3.so -> build/bdist.linux-x86_64/wheel/./vllm/vllm_flash_attn 2025-09-07T09:12:47.9573740Z #34 8313.2 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/vllm_flash_attn 2025-09-07T09:12:48.1095313Z #34 8313.2 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/flash_attn_interface.py -> build/bdist.linux-x86_64/wheel/./vllm/vllm_flash_attn 2025-09-07T09:12:48.1096545Z #34 8313.2 creating build/bdist.linux-x86_64/wheel/vllm/vllm_flash_attn/layers 2025-09-07T09:12:48.1097600Z #34 8313.2 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/vllm_flash_attn/layers 2025-09-07T09:12:48.1098981Z #34 8313.2 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/layers/rotary.py -> build/bdist.linux-x86_64/wheel/./vllm/vllm_flash_attn/layers 2025-09-07T09:12:48.1099993Z #34 8313.2 creating build/bdist.linux-x86_64/wheel/vllm/vllm_flash_attn/ops 2025-09-07T09:12:48.1100759Z #34 8313.2 creating build/bdist.linux-x86_64/wheel/vllm/vllm_flash_attn/ops/triton 2025-09-07T09:12:48.1101837Z #34 8313.2 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/__init__.py -> build/bdist.linux-x86_64/wheel/./vllm/vllm_flash_attn/ops/triton 2025-09-07T09:12:48.1103288Z #34 8313.2 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/ops/triton/rotary.py -> build/bdist.linux-x86_64/wheel/./vllm/vllm_flash_attn/ops/triton 2025-09-07T09:12:48.1104660Z #34 8313.2 copying build/lib.linux-x86_64-cpython-312/vllm/vllm_flash_attn/_vllm_fa3_C.abi3.so -> build/bdist.linux-x86_64/wheel/./vllm/vllm_flash_attn 2025-09-07T09:12:52.4876211Z #34 8317.8 copying build/lib.linux-x86_64-cpython-312/vllm/_moe_C.abi3.so -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:53.5455984Z #34 8318.8 copying build/lib.linux-x86_64-cpython-312/vllm/_flashmla_C.abi3.so -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:53.7037550Z #34 8318.8 copying build/lib.linux-x86_64-cpython-312/vllm/cumem_allocator.abi3.so -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:53.7038634Z #34 8318.8 copying build/lib.linux-x86_64-cpython-312/vllm/_C.abi3.so -> build/bdist.linux-x86_64/wheel/./vllm 2025-09-07T09:12:55.9034649Z #34 8321.2 running install_egg_info 2025-09-07T09:12:56.0791723Z #34 8321.2 Copying vllm.egg-info to build/bdist.linux-x86_64/wheel/./vllm-0.10.2rc2.dev125+g4172235ab.d20250907-py3.12.egg-info 2025-09-07T09:12:56.0792979Z #34 8321.2 running install_scripts 2025-09-07T09:13:13.2963988Z #34 8321.2 creating build/bdist.linux-x86_64/wheel/vllm-0.10.2rc2.dev125+g4172235ab.d20250907.dist-info/WHEEL 2025-09-07T09:13:13.2965196Z #34 8321.2 creating 'vllm-dist/vllm-0.10.2rc2.dev125+g4172235ab.d20250907-cp38-abi3-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it 2025-09-07T09:13:13.2966013Z #34 8338.6 adding 'vllm/_C.abi3.so' 2025-09-07T09:13:14.0348698Z #34 8339.3 adding 'vllm/__init__.py' 2025-09-07T09:13:14.2343962Z #34 8339.3 adding 'vllm/_custom_ops.py' 2025-09-07T09:13:14.2344445Z #34 8339.4 adding 'vllm/_flashmla_C.abi3.so' 2025-09-07T09:13:14.2344953Z #34 8339.4 adding 'vllm/_ipex_ops.py' 2025-09-07T09:13:21.4918790Z #34 8346.8 adding 'vllm/_moe_C.abi3.so' 2025-09-07T09:13:21.8303281Z #34 8347.1 adding 'vllm/_version.py' 2025-09-07T09:13:21.9303327Z #34 8347.1 adding 'vllm/beam_search.py' 2025-09-07T09:13:21.9303763Z #34 8347.1 adding 'vllm/collect_env.py' 2025-09-07T09:13:21.9304158Z #34 8347.1 adding 'vllm/connections.py' 2025-09-07T09:13:21.9304534Z #34 8347.1 adding 'vllm/cumem_allocator.abi3.so' 2025-09-07T09:13:21.9305246Z #34 8347.1 adding 'vllm/env_override.py' 2025-09-07T09:13:21.9305587Z #34 8347.1 adding 'vllm/envs.py' 2025-09-07T09:13:21.9305922Z #34 8347.1 adding 'vllm/forward_context.py' 2025-09-07T09:13:21.9306310Z #34 8347.1 adding 'vllm/logger.py' 2025-09-07T09:13:21.9306668Z #34 8347.1 adding 'vllm/logits_process.py' 2025-09-07T09:13:21.9307040Z #34 8347.1 adding 'vllm/logprobs.py' 2025-09-07T09:13:21.9307376Z #34 8347.1 adding 'vllm/outputs.py' 2025-09-07T09:13:21.9307731Z #34 8347.1 adding 'vllm/pooling_params.py' 2025-09-07T09:13:21.9308090Z #34 8347.1 adding 'vllm/py.typed' 2025-09-07T09:13:21.9308418Z #34 8347.1 adding 'vllm/sampling_params.py' 2025-09-07T09:13:21.9308784Z #34 8347.1 adding 'vllm/scalar_type.py' 2025-09-07T09:13:21.9309113Z #34 8347.1 adding 'vllm/scripts.py' 2025-09-07T09:13:21.9309570Z #34 8347.1 adding 'vllm/sequence.py' 2025-09-07T09:13:21.9309888Z #34 8347.1 adding 'vllm/tasks.py' 2025-09-07T09:13:21.9310219Z #34 8347.1 adding 'vllm/test_utils.py' 2025-09-07T09:13:21.9310545Z #34 8347.1 adding 'vllm/tracing.py' 2025-09-07T09:13:21.9310878Z #34 8347.1 adding 'vllm/version.py' 2025-09-07T09:13:21.9311238Z #34 8347.1 adding 'vllm/adapter_commons/__init__.py' 2025-09-07T09:13:21.9311664Z #34 8347.1 adding 'vllm/adapter_commons/layers.py' 2025-09-07T09:13:21.9312081Z #34 8347.1 adding 'vllm/adapter_commons/models.py' 2025-09-07T09:13:21.9312488Z #34 8347.1 adding 'vllm/adapter_commons/request.py' 2025-09-07T09:13:21.9312977Z #34 8347.1 adding 'vllm/adapter_commons/utils.py' 2025-09-07T09:13:21.9313408Z #34 8347.1 adding 'vllm/adapter_commons/worker_manager.py' 2025-09-07T09:13:21.9313838Z #34 8347.1 adding 'vllm/assets/__init__.py' 2025-09-07T09:13:21.9314187Z #34 8347.1 adding 'vllm/assets/audio.py' 2025-09-07T09:13:21.9314543Z #34 8347.1 adding 'vllm/assets/base.py' 2025-09-07T09:13:21.9314928Z #34 8347.1 adding 'vllm/assets/image.py' 2025-09-07T09:13:21.9315280Z #34 8347.1 adding 'vllm/assets/video.py' 2025-09-07T09:13:21.9315636Z #34 8347.1 adding 'vllm/attention/__init__.py' 2025-09-07T09:13:21.9316023Z #34 8347.1 adding 'vllm/attention/layer.py' 2025-09-07T09:13:21.9316390Z #34 8347.1 adding 'vllm/attention/selector.py' 2025-09-07T09:13:21.9316807Z #34 8347.1 adding 'vllm/attention/backends/__init__.py' 2025-09-07T09:13:21.9317244Z #34 8347.1 adding 'vllm/attention/backends/abstract.py' 2025-09-07T09:13:21.9317767Z #34 8347.1 adding 'vllm/attention/backends/differential_flash_attn.py' 2025-09-07T09:13:21.9318414Z #34 8347.1 adding 'vllm/attention/backends/dual_chunk_flash_attn.py' 2025-09-07T09:13:21.9318921Z #34 8347.1 adding 'vllm/attention/backends/flash_attn.py' 2025-09-07T09:13:21.9319380Z #34 8347.1 adding 'vllm/attention/backends/flashmla.py' 2025-09-07T09:13:21.9319850Z #34 8347.1 adding 'vllm/attention/backends/placeholder_attn.py' 2025-09-07T09:13:21.9320358Z #34 8347.1 adding 'vllm/attention/backends/rocm_aiter_mla.py' 2025-09-07T09:13:21.9320839Z #34 8347.1 adding 'vllm/attention/backends/rocm_flash_attn.py' 2025-09-07T09:13:21.9321318Z #34 8347.1 adding 'vllm/attention/backends/triton_mla.py' 2025-09-07T09:13:21.9321755Z #34 8347.1 adding 'vllm/attention/backends/utils.py' 2025-09-07T09:13:21.9322192Z #34 8347.1 adding 'vllm/attention/backends/xformers.py' 2025-09-07T09:13:21.9322653Z #34 8347.1 adding 'vllm/attention/backends/mla/__init__.py' 2025-09-07T09:13:21.9323116Z #34 8347.1 adding 'vllm/attention/backends/mla/common.py' 2025-09-07T09:13:21.9323564Z #34 8347.1 adding 'vllm/attention/layers/__init__.py' 2025-09-07T09:13:21.9324048Z #34 8347.1 adding 'vllm/attention/layers/chunked_local_attention.py' 2025-09-07T09:13:21.9324604Z #34 8347.1 adding 'vllm/attention/layers/encoder_only_attention.py' 2025-09-07T09:13:21.9325074Z #34 8347.1 adding 'vllm/attention/ops/__init__.py' 2025-09-07T09:13:21.9325569Z #34 8347.1 adding 'vllm/attention/ops/chunked_prefill_paged_decode.py' 2025-09-07T09:13:21.9326071Z #34 8347.1 adding 'vllm/attention/ops/common.py' 2025-09-07T09:13:21.9326470Z #34 8347.1 adding 'vllm/attention/ops/flashmla.py' 2025-09-07T09:13:21.9326920Z #34 8347.1 adding 'vllm/attention/ops/merge_attn_states.py' 2025-09-07T09:13:21.9327402Z #34 8347.2 adding 'vllm/attention/ops/paged_attn.py' 2025-09-07T09:13:21.9327882Z #34 8347.2 adding 'vllm/attention/ops/pallas_kv_cache_update.py' 2025-09-07T09:13:21.9328365Z #34 8347.2 adding 'vllm/attention/ops/prefix_prefill.py' 2025-09-07T09:13:21.9328819Z #34 8347.2 adding 'vllm/attention/ops/rocm_aiter_mla.py' 2025-09-07T09:13:21.9329304Z #34 8347.2 adding 'vllm/attention/ops/rocm_aiter_paged_attn.py' 2025-09-07T09:13:21.9329814Z #34 8347.2 adding 'vllm/attention/ops/triton_decode_attention.py' 2025-09-07T09:13:21.9330346Z #34 8347.2 adding 'vllm/attention/ops/triton_flash_attention.py' 2025-09-07T09:13:21.9330857Z #34 8347.2 adding 'vllm/attention/ops/triton_merge_attn_states.py' 2025-09-07T09:13:21.9331392Z #34 8347.2 adding 'vllm/attention/ops/triton_unified_attention.py' 2025-09-07T09:13:21.9331899Z #34 8347.2 adding 'vllm/attention/utils/__init__.py' 2025-09-07T09:13:21.9332416Z #34 8347.2 adding 'vllm/attention/utils/fa_utils.py' 2025-09-07T09:13:21.9333039Z #34 8347.2 adding 'vllm/attention/utils/kv_sharing_utils.py' 2025-09-07T09:13:21.9333501Z #34 8347.2 adding 'vllm/benchmarks/__init__.py' 2025-09-07T09:13:21.9333917Z #34 8347.2 adding 'vllm/benchmarks/datasets.py' 2025-09-07T09:13:21.9334315Z #34 8347.2 adding 'vllm/benchmarks/latency.py' 2025-09-07T09:13:21.9334715Z #34 8347.2 adding 'vllm/benchmarks/serve.py' 2025-09-07T09:13:21.9335161Z #34 8347.2 adding 'vllm/benchmarks/throughput.py' 2025-09-07T09:13:21.9335593Z #34 8347.2 adding 'vllm/benchmarks/lib/__init__.py' 2025-09-07T09:13:21.9336065Z #34 8347.2 adding 'vllm/benchmarks/lib/endpoint_request_func.py' 2025-09-07T09:13:21.9336568Z #34 8347.2 adding 'vllm/benchmarks/lib/ready_checker.py' 2025-09-07T09:13:21.9337009Z #34 8347.2 adding 'vllm/benchmarks/lib/utils.py' 2025-09-07T09:13:21.9337415Z #34 8347.2 adding 'vllm/compilation/__init__.py' 2025-09-07T09:13:21.9337891Z #34 8347.2 adding 'vllm/compilation/activation_quant_fusion.py' 2025-09-07T09:13:21.9338355Z #34 8347.2 adding 'vllm/compilation/backends.py' 2025-09-07T09:13:21.9338809Z #34 8347.2 adding 'vllm/compilation/base_static_graph.py' 2025-09-07T09:13:21.9339276Z #34 8347.2 adding 'vllm/compilation/collective_fusion.py' 2025-09-07T09:13:21.9339762Z #34 8347.2 adding 'vllm/compilation/compiler_interface.py' 2025-09-07T09:13:21.9340204Z #34 8347.2 adding 'vllm/compilation/counter.py' 2025-09-07T09:13:21.9340662Z #34 8347.2 adding 'vllm/compilation/cuda_graph.py' 2025-09-07T09:13:21.9341139Z #34 8347.2 adding 'vllm/compilation/cuda_piecewise_backend.py' 2025-09-07T09:13:21.9341622Z #34 8347.2 adding 'vllm/compilation/decorators.py' 2025-09-07T09:13:21.9342102Z #34 8347.2 adding 'vllm/compilation/fix_functionalization.py' 2025-09-07T09:13:21.9342573Z #34 8347.2 adding 'vllm/compilation/fusion.py' 2025-09-07T09:13:21.9342983Z #34 8347.2 adding 'vllm/compilation/fusion_attn.py' 2025-09-07T09:13:21.9343412Z #34 8347.2 adding 'vllm/compilation/fx_utils.py' 2025-09-07T09:13:21.9343836Z #34 8347.2 adding 'vllm/compilation/inductor_pass.py' 2025-09-07T09:13:21.9344275Z #34 8347.2 adding 'vllm/compilation/monitor.py' 2025-09-07T09:13:21.9344825Z #34 8347.2 adding 'vllm/compilation/multi_output_match.py' 2025-09-07T09:13:21.9345295Z #34 8347.2 adding 'vllm/compilation/noop_elimination.py' 2025-09-07T09:13:21.9345744Z #34 8347.2 adding 'vllm/compilation/pass_manager.py' 2025-09-07T09:13:21.9346199Z #34 8347.2 adding 'vllm/compilation/sequence_parallelism.py' 2025-09-07T09:13:21.9346721Z #34 8347.2 adding 'vllm/compilation/torch25_custom_graph_pass.py' 2025-09-07T09:13:21.9347209Z #34 8347.2 adding 'vllm/compilation/vllm_inductor_pass.py' 2025-09-07T09:13:21.9347654Z #34 8347.2 adding 'vllm/compilation/wrapper.py' 2025-09-07T09:13:21.9348034Z #34 8347.2 adding 'vllm/config/__init__.py' 2025-09-07T09:13:21.9348401Z #34 8347.2 adding 'vllm/config/cache.py' 2025-09-07T09:13:21.9348766Z #34 8347.2 adding 'vllm/config/compilation.py' 2025-09-07T09:13:21.9349149Z #34 8347.2 adding 'vllm/config/parallel.py' 2025-09-07T09:13:21.9349525Z #34 8347.2 adding 'vllm/config/scheduler.py' 2025-09-07T09:13:21.9349911Z #34 8347.2 adding 'vllm/config/utils.py' 2025-09-07T09:13:21.9350266Z #34 8347.2 adding 'vllm/core/__init__.py' 2025-09-07T09:13:21.9350624Z #34 8347.2 adding 'vllm/core/block_manager.py' 2025-09-07T09:13:21.9350998Z #34 8347.2 adding 'vllm/core/evictor.py' 2025-09-07T09:13:21.9351350Z #34 8347.2 adding 'vllm/core/interfaces.py' 2025-09-07T09:13:21.9351801Z #34 8347.2 adding 'vllm/core/placeholder_block_space_manager.py' 2025-09-07T09:13:21.9352237Z #34 8347.2 adding 'vllm/core/scheduler.py' 2025-09-07T09:13:21.9352608Z #34 8347.2 adding 'vllm/core/block/__init__.py' 2025-09-07T09:13:21.9353006Z #34 8347.2 adding 'vllm/core/block/block_table.py' 2025-09-07T09:13:21.9353391Z #34 8347.2 adding 'vllm/core/block/common.py' 2025-09-07T09:13:21.9353861Z #34 8347.2 adding 'vllm/core/block/cpu_gpu_block_allocator.py' 2025-09-07T09:13:21.9354301Z #34 8347.2 adding 'vllm/core/block/interfaces.py' 2025-09-07T09:13:21.9354708Z #34 8347.2 adding 'vllm/core/block/naive_block.py' 2025-09-07T09:13:21.9355141Z #34 8347.2 adding 'vllm/core/block/prefix_caching_block.py' 2025-09-07T09:13:21.9355568Z #34 8347.2 adding 'vllm/core/block/utils.py' 2025-09-07T09:13:21.9355955Z #34 8347.2 adding 'vllm/device_allocator/__init__.py' 2025-09-07T09:13:21.9356376Z #34 8347.2 adding 'vllm/device_allocator/cumem.py' 2025-09-07T09:13:21.9356787Z #34 8347.2 adding 'vllm/distributed/__init__.py' 2025-09-07T09:13:21.9357243Z #34 8347.2 adding 'vllm/distributed/communication_op.py' 2025-09-07T09:13:21.9357679Z #34 8347.2 adding 'vllm/distributed/kv_events.py' 2025-09-07T09:13:21.9358094Z #34 8347.2 adding 'vllm/distributed/parallel_state.py' 2025-09-07T09:13:21.9358564Z #34 8347.2 adding 'vllm/distributed/tpu_distributed_utils.py' 2025-09-07T09:13:21.9358998Z #34 8347.2 adding 'vllm/distributed/utils.py' 2025-09-07T09:13:21.9359472Z #34 8347.2 adding 'vllm/distributed/device_communicators/__init__.py' 2025-09-07T09:13:21.9360041Z #34 8347.2 adding 'vllm/distributed/device_communicators/all2all.py' 2025-09-07T09:13:21.9360636Z #34 8347.2 adding 'vllm/distributed/device_communicators/all_reduce_utils.py' 2025-09-07T09:13:21.9361333Z #34 8347.2 adding 'vllm/distributed/device_communicators/base_device_communicator.py' 2025-09-07T09:13:21.9362015Z #34 8347.2 adding 'vllm/distributed/device_communicators/cpu_communicator.py' 2025-09-07T09:13:21.9362705Z #34 8347.2 adding 'vllm/distributed/device_communicators/cuda_communicator.py' 2025-09-07T09:13:21.9363331Z #34 8347.2 adding 'vllm/distributed/device_communicators/cuda_wrapper.py' 2025-09-07T09:13:21.9363964Z #34 8347.2 adding 'vllm/distributed/device_communicators/custom_all_reduce.py' 2025-09-07T09:13:21.9364576Z #34 8347.2 adding 'vllm/distributed/device_communicators/pynccl.py' 2025-09-07T09:13:21.9365157Z #34 8347.2 adding 'vllm/distributed/device_communicators/pynccl_wrapper.py' 2025-09-07T09:13:21.9365797Z #34 8347.2 adding 'vllm/distributed/device_communicators/quick_all_reduce.py' 2025-09-07T09:13:21.9366435Z #34 8347.2 adding 'vllm/distributed/device_communicators/ray_communicator.py' 2025-09-07T09:13:21.9367077Z #34 8347.2 adding 'vllm/distributed/device_communicators/shm_broadcast.py' 2025-09-07T09:13:21.9367662Z #34 8347.2 adding 'vllm/distributed/device_communicators/symm_mem.py' 2025-09-07T09:13:21.9368270Z #34 8347.2 adding 'vllm/distributed/device_communicators/tpu_communicator.py' 2025-09-07T09:13:21.9368929Z #34 8347.2 adding 'vllm/distributed/device_communicators/xpu_communicator.py' 2025-09-07T09:13:21.9369469Z #34 8347.2 adding 'vllm/distributed/eplb/__init__.py' 2025-09-07T09:13:21.9369910Z #34 8347.2 adding 'vllm/distributed/eplb/eplb_state.py' 2025-09-07T09:13:21.9370360Z #34 8347.2 adding 'vllm/distributed/eplb/rebalance_algo.py' 2025-09-07T09:13:21.9370852Z #34 8347.2 adding 'vllm/distributed/eplb/rebalance_execute.py' 2025-09-07T09:13:21.9371330Z #34 8347.2 adding 'vllm/distributed/kv_transfer/README.md' 2025-09-07T09:13:21.9371810Z #34 8347.2 adding 'vllm/distributed/kv_transfer/__init__.py' 2025-09-07T09:13:21.9372487Z #34 8347.2 adding 'vllm/distributed/kv_transfer/disagg_prefill_workflow.jpg' 2025-09-07T09:13:21.9373270Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_transfer_state.py' 2025-09-07T09:13:21.9373873Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_connector/__init__.py' 2025-09-07T09:13:21.9374459Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_connector/base.py' 2025-09-07T09:13:21.9375066Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_connector/factory.py' 2025-09-07T09:13:21.9375657Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_connector/utils.py' 2025-09-07T09:13:21.9376272Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_connector/v1/__init__.py' 2025-09-07T09:13:21.9376901Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_connector/v1/base.py' 2025-09-07T09:13:21.9377598Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py' 2025-09-07T09:13:21.9378329Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py' 2025-09-07T09:13:21.9379028Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py' 2025-09-07T09:13:21.9379777Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_connector/v1/shared_storage_connector.py' 2025-09-07T09:13:22.0320037Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_connector/v1/p2p/__init__.py' 2025-09-07T09:13:22.0320831Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_connector.py' 2025-09-07T09:13:22.0321758Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_engine.py' 2025-09-07T09:13:22.0322516Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_connector/v1/p2p/tensor_memory_pool.py' 2025-09-07T09:13:22.0323202Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_lookup_buffer/__init__.py' 2025-09-07T09:13:22.0323832Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_lookup_buffer/base.py' 2025-09-07T09:13:22.0324470Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_lookup_buffer/mooncake_store.py' 2025-09-07T09:13:22.0325167Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_lookup_buffer/simple_buffer.py' 2025-09-07T09:13:22.0325792Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_pipe/__init__.py' 2025-09-07T09:13:22.0326320Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_pipe/base.py' 2025-09-07T09:13:22.0326885Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py' 2025-09-07T09:13:22.0327541Z #34 8347.2 adding 'vllm/distributed/kv_transfer/kv_pipe/pynccl_pipe.py' 2025-09-07T09:13:22.0328034Z #34 8347.2 adding 'vllm/engine/__init__.py' 2025-09-07T09:13:22.0328396Z #34 8347.2 adding 'vllm/engine/arg_utils.py' 2025-09-07T09:13:22.0328791Z #34 8347.2 adding 'vllm/engine/async_llm_engine.py' 2025-09-07T09:13:22.0329199Z #34 8347.2 adding 'vllm/engine/async_timeout.py' 2025-09-07T09:13:22.0341315Z #34 8347.2 adding 'vllm/engine/llm_engine.py' 2025-09-07T09:13:22.0341726Z #34 8347.2 adding 'vllm/engine/metrics.py' 2025-09-07T09:13:22.0342139Z #34 8347.2 adding 'vllm/engine/metrics_types.py' 2025-09-07T09:13:22.0342551Z #34 8347.2 adding 'vllm/engine/protocol.py' 2025-09-07T09:13:22.0343001Z #34 8347.2 adding 'vllm/engine/multiprocessing/__init__.py' 2025-09-07T09:13:22.0343499Z #34 8347.2 adding 'vllm/engine/multiprocessing/client.py' 2025-09-07T09:13:22.0344105Z #34 8347.2 adding 'vllm/engine/multiprocessing/engine.py' 2025-09-07T09:13:22.0344586Z #34 8347.2 adding 'vllm/engine/output_processor/__init__.py' 2025-09-07T09:13:22.0345079Z #34 8347.2 adding 'vllm/engine/output_processor/interfaces.py' 2025-09-07T09:13:22.0345590Z #34 8347.2 adding 'vllm/engine/output_processor/single_step.py' 2025-09-07T09:13:22.0346097Z #34 8347.2 adding 'vllm/engine/output_processor/stop_checker.py' 2025-09-07T09:13:22.0346593Z #34 8347.2 adding 'vllm/engine/output_processor/util.py' 2025-09-07T09:13:22.0347028Z #34 8347.2 adding 'vllm/entrypoints/__init__.py' 2025-09-07T09:13:22.0347432Z #34 8347.2 adding 'vllm/entrypoints/api_server.py' 2025-09-07T09:13:22.0347858Z #34 8347.2 adding 'vllm/entrypoints/chat_utils.py' 2025-09-07T09:13:22.0348391Z #34 8347.2 adding 'vllm/entrypoints/constants.py' 2025-09-07T09:13:22.0348806Z #34 8347.2 adding 'vllm/entrypoints/context.py' 2025-09-07T09:13:22.0349216Z #34 8347.2 adding 'vllm/entrypoints/harmony_utils.py' 2025-09-07T09:13:22.0349644Z #34 8347.2 adding 'vllm/entrypoints/launcher.py' 2025-09-07T09:13:22.0350025Z #34 8347.2 adding 'vllm/entrypoints/llm.py' 2025-09-07T09:13:22.0350418Z #34 8347.2 adding 'vllm/entrypoints/logger.py' 2025-09-07T09:13:22.0350822Z #34 8347.2 adding 'vllm/entrypoints/renderer.py' 2025-09-07T09:13:22.0351278Z #34 8347.2 adding 'vllm/entrypoints/score_utils.py' 2025-09-07T09:13:22.0351682Z #34 8347.2 adding 'vllm/entrypoints/ssl.py' 2025-09-07T09:13:22.0352044Z #34 8347.2 adding 'vllm/entrypoints/tool.py' 2025-09-07T09:13:22.0352496Z #34 8347.2 adding 'vllm/entrypoints/tool_server.py' 2025-09-07T09:13:22.0352894Z #34 8347.2 adding 'vllm/entrypoints/utils.py' 2025-09-07T09:13:22.0353302Z #34 8347.2 adding 'vllm/entrypoints/cli/__init__.py' 2025-09-07T09:13:22.0353730Z #34 8347.2 adding 'vllm/entrypoints/cli/collect_env.py' 2025-09-07T09:13:22.0354155Z #34 8347.2 adding 'vllm/entrypoints/cli/main.py' 2025-09-07T09:13:22.0354570Z #34 8347.2 adding 'vllm/entrypoints/cli/openai.py' 2025-09-07T09:13:22.0354985Z #34 8347.2 adding 'vllm/entrypoints/cli/run_batch.py' 2025-09-07T09:13:22.0355409Z #34 8347.2 adding 'vllm/entrypoints/cli/serve.py' 2025-09-07T09:13:22.0355851Z #34 8347.2 adding 'vllm/entrypoints/cli/types.py' 2025-09-07T09:13:22.0356310Z #34 8347.2 adding 'vllm/entrypoints/cli/benchmark/__init__.py' 2025-09-07T09:13:22.0356787Z #34 8347.2 adding 'vllm/entrypoints/cli/benchmark/base.py' 2025-09-07T09:13:22.0357275Z #34 8347.2 adding 'vllm/entrypoints/cli/benchmark/latency.py' 2025-09-07T09:13:22.0357761Z #34 8347.2 adding 'vllm/entrypoints/cli/benchmark/main.py' 2025-09-07T09:13:22.0358231Z #34 8347.2 adding 'vllm/entrypoints/cli/benchmark/serve.py' 2025-09-07T09:13:22.0358747Z #34 8347.2 adding 'vllm/entrypoints/cli/benchmark/throughput.py' 2025-09-07T09:13:22.0359232Z #34 8347.2 adding 'vllm/entrypoints/openai/__init__.py' 2025-09-07T09:13:22.0359692Z #34 8347.3 adding 'vllm/entrypoints/openai/api_server.py' 2025-09-07T09:13:22.0360139Z #34 8347.3 adding 'vllm/entrypoints/openai/cli_args.py' 2025-09-07T09:13:22.0360631Z #34 8347.3 adding 'vllm/entrypoints/openai/logits_processors.py' 2025-09-07T09:13:22.0361110Z #34 8347.3 adding 'vllm/entrypoints/openai/protocol.py' 2025-09-07T09:13:22.0361609Z #34 8347.3 adding 'vllm/entrypoints/openai/run_batch.py' 2025-09-07T09:13:22.0362083Z #34 8347.3 adding 'vllm/entrypoints/openai/serving_chat.py' 2025-09-07T09:13:22.0362602Z #34 8347.3 adding 'vllm/entrypoints/openai/serving_classification.py' 2025-09-07T09:13:22.0363165Z #34 8347.3 adding 'vllm/entrypoints/openai/serving_completion.py' 2025-09-07T09:13:22.0363688Z #34 8347.3 adding 'vllm/entrypoints/openai/serving_embedding.py' 2025-09-07T09:13:22.0364239Z #34 8347.3 adding 'vllm/entrypoints/openai/serving_engine.py' 2025-09-07T09:13:22.0364739Z #34 8347.3 adding 'vllm/entrypoints/openai/serving_models.py' 2025-09-07T09:13:22.0365229Z #34 8347.3 adding 'vllm/entrypoints/openai/serving_pooling.py' 2025-09-07T09:13:22.0365747Z #34 8347.3 adding 'vllm/entrypoints/openai/serving_responses.py' 2025-09-07T09:13:22.0366242Z #34 8347.3 adding 'vllm/entrypoints/openai/serving_score.py' 2025-09-07T09:13:22.0366769Z #34 8347.3 adding 'vllm/entrypoints/openai/serving_tokenization.py' 2025-09-07T09:13:22.0367341Z #34 8347.3 adding 'vllm/entrypoints/openai/serving_transcription.py' 2025-09-07T09:13:22.0367862Z #34 8347.3 adding 'vllm/entrypoints/openai/speech_to_text.py' 2025-09-07T09:13:22.0368390Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/__init__.py' 2025-09-07T09:13:22.0368999Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/abstract_tool_parser.py' 2025-09-07T09:13:22.0369704Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/deepseekv31_tool_parser.py' 2025-09-07T09:13:22.0370401Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/deepseekv3_tool_parser.py' 2025-09-07T09:13:22.0371135Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/glm4_moe_tool_parser.py' 2025-09-07T09:13:22.0371838Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/granite_20b_fc_tool_parser.py' 2025-09-07T09:13:22.0372825Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/granite_tool_parser.py' 2025-09-07T09:13:22.0373518Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py' 2025-09-07T09:13:22.0374212Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/hunyuan_a13b_tool_parser.py' 2025-09-07T09:13:22.0374941Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/internlm2_tool_parser.py' 2025-09-07T09:13:22.0375629Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/jamba_tool_parser.py' 2025-09-07T09:13:22.0376334Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/kimi_k2_tool_parser.py' 2025-09-07T09:13:22.0377067Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/llama4_pythonic_tool_parser.py' 2025-09-07T09:13:22.0377779Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/llama_tool_parser.py' 2025-09-07T09:13:22.0378457Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/minimax_tool_parser.py' 2025-09-07T09:13:22.0379133Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py' 2025-09-07T09:13:22.0379821Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/openai_tool_parser.py' 2025-09-07T09:13:22.0380538Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/phi4mini_tool_parser.py' 2025-09-07T09:13:22.0381222Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py' 2025-09-07T09:13:22.0381928Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/qwen3coder_tool_parser.py' 2025-09-07T09:13:22.0382623Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/seed_oss_tool_parser.py' 2025-09-07T09:13:22.0383306Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/step3_tool_parser.py' 2025-09-07T09:13:22.0383922Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/utils.py' 2025-09-07T09:13:22.0384618Z #34 8347.3 adding 'vllm/entrypoints/openai/tool_parsers/xlam_tool_parser.py' 2025-09-07T09:13:22.0385136Z #34 8347.3 adding 'vllm/executor/__init__.py' 2025-09-07T09:13:22.0385525Z #34 8347.3 adding 'vllm/executor/executor_base.py' 2025-09-07T09:13:22.0385984Z #34 8347.3 adding 'vllm/executor/mp_distributed_executor.py' 2025-09-07T09:13:22.0386464Z #34 8347.3 adding 'vllm/executor/msgspec_utils.py' 2025-09-07T09:13:22.0386920Z #34 8347.3 adding 'vllm/executor/multiproc_worker_utils.py' 2025-09-07T09:13:22.0387397Z #34 8347.3 adding 'vllm/executor/ray_distributed_executor.py' 2025-09-07T09:13:22.0387847Z #34 8347.3 adding 'vllm/executor/ray_utils.py' 2025-09-07T09:13:22.0388266Z #34 8347.3 adding 'vllm/executor/uniproc_executor.py' 2025-09-07T09:13:22.0388665Z #34 8347.3 adding 'vllm/inputs/__init__.py' 2025-09-07T09:13:22.0389030Z #34 8347.3 adding 'vllm/inputs/data.py' 2025-09-07T09:13:22.0389376Z #34 8347.3 adding 'vllm/inputs/parse.py' 2025-09-07T09:13:22.0389760Z #34 8347.3 adding 'vllm/inputs/preprocess.py' 2025-09-07T09:13:22.0390132Z #34 8347.3 adding 'vllm/inputs/registry.py' 2025-09-07T09:13:22.0390522Z #34 8347.3 adding 'vllm/logging_utils/__init__.py' 2025-09-07T09:13:22.0390936Z #34 8347.3 adding 'vllm/logging_utils/dump_input.py' 2025-09-07T09:13:22.0391361Z #34 8347.3 adding 'vllm/logging_utils/formatter.py' 2025-09-07T09:13:22.0391762Z #34 8347.3 adding 'vllm/lora/__init__.py' 2025-09-07T09:13:22.0392516Z #34 8347.3 adding 'vllm/lora/fully_sharded_layers.py' 2025-09-07T09:13:22.0392930Z #34 8347.3 adding 'vllm/lora/layers.py' 2025-09-07T09:13:22.0393277Z #34 8347.3 adding 'vllm/lora/lora.py' 2025-09-07T09:13:22.0393637Z #34 8347.3 adding 'vllm/lora/models.py' 2025-09-07T09:13:22.0393998Z #34 8347.3 adding 'vllm/lora/peft_helper.py' 2025-09-07T09:13:22.0394383Z #34 8347.3 adding 'vllm/lora/request.py' 2025-09-07T09:13:22.0394739Z #34 8347.3 adding 'vllm/lora/resolver.py' 2025-09-07T09:13:22.0395108Z #34 8347.3 adding 'vllm/lora/utils.py' 2025-09-07T09:13:22.0395569Z #34 8347.3 adding 'vllm/lora/worker_manager.py' 2025-09-07T09:13:22.0395961Z #34 8347.3 adding 'vllm/lora/ops/__init__.py' 2025-09-07T09:13:22.0396384Z #34 8347.3 adding 'vllm/lora/ops/ipex_ops/__init__.py' 2025-09-07T09:13:22.0396827Z #34 8347.3 adding 'vllm/lora/ops/ipex_ops/lora_ops.py' 2025-09-07T09:13:22.0397283Z #34 8347.3 adding 'vllm/lora/ops/torch_ops/__init__.py' 2025-09-07T09:13:22.0397726Z #34 8347.3 adding 'vllm/lora/ops/torch_ops/lora_ops.py' 2025-09-07T09:13:22.0398190Z #34 8347.3 adding 'vllm/lora/ops/triton_ops/__init__.py' 2025-09-07T09:13:22.0398670Z #34 8347.3 adding 'vllm/lora/ops/triton_ops/kernel_utils.py' 2025-09-07T09:13:22.0399160Z #34 8347.3 adding 'vllm/lora/ops/triton_ops/lora_expand_op.py' 2025-09-07T09:13:22.0399753Z #34 8347.3 adding 'vllm/lora/ops/triton_ops/lora_kernel_metadata.py' 2025-09-07T09:13:22.0400282Z #34 8347.3 adding 'vllm/lora/ops/triton_ops/lora_shrink_op.py' 2025-09-07T09:13:22.0400761Z #34 8347.3 adding 'vllm/lora/ops/triton_ops/utils.py' 2025-09-07T09:13:22.0401198Z #34 8347.3 adding 'vllm/lora/ops/xla_ops/__init__.py' 2025-09-07T09:13:22.0401643Z #34 8347.3 adding 'vllm/lora/ops/xla_ops/lora_ops.py' 2025-09-07T09:13:22.0402088Z #34 8347.3 adding 'vllm/lora/punica_wrapper/__init__.py' 2025-09-07T09:13:22.0402565Z #34 8347.3 adding 'vllm/lora/punica_wrapper/punica_base.py' 2025-09-07T09:13:22.0403109Z #34 8347.3 adding 'vllm/lora/punica_wrapper/punica_cpu.py' 2025-09-07T09:13:22.0403585Z #34 8347.3 adding 'vllm/lora/punica_wrapper/punica_gpu.py' 2025-09-07T09:13:22.0404097Z #34 8347.3 adding 'vllm/lora/punica_wrapper/punica_selector.py' 2025-09-07T09:13:22.0404691Z #34 8347.3 adding 'vllm/lora/punica_wrapper/punica_tpu.py' 2025-09-07T09:13:22.0405164Z #34 8347.3 adding 'vllm/lora/punica_wrapper/punica_xpu.py' 2025-09-07T09:13:22.0405608Z #34 8347.3 adding 'vllm/lora/punica_wrapper/utils.py' 2025-09-07T09:13:22.0406037Z #34 8347.3 adding 'vllm/model_executor/__init__.py' 2025-09-07T09:13:22.0406463Z #34 8347.3 adding 'vllm/model_executor/custom_op.py' 2025-09-07T09:13:22.0406886Z #34 8347.3 adding 'vllm/model_executor/parameter.py' 2025-09-07T09:13:22.0407346Z #34 8347.3 adding 'vllm/model_executor/sampling_metadata.py' 2025-09-07T09:13:22.0407783Z #34 8347.3 adding 'vllm/model_executor/utils.py' 2025-09-07T09:13:22.0408216Z #34 8347.3 adding 'vllm/model_executor/layers/__init__.py' 2025-09-07T09:13:22.0408730Z #34 8347.3 adding 'vllm/model_executor/layers/activation.py' 2025-09-07T09:13:22.0409263Z #34 8347.3 adding 'vllm/model_executor/layers/attention_layer_base.py' 2025-09-07T09:13:22.0409791Z #34 8347.3 adding 'vllm/model_executor/layers/layernorm.py' 2025-09-07T09:13:22.0410282Z #34 8347.3 adding 'vllm/model_executor/layers/lightning_attn.py' 2025-09-07T09:13:22.0410773Z #34 8347.3 adding 'vllm/model_executor/layers/linear.py' 2025-09-07T09:13:22.0411260Z #34 8347.3 adding 'vllm/model_executor/layers/logits_processor.py' 2025-09-07T09:13:22.0411748Z #34 8347.3 adding 'vllm/model_executor/layers/mla.py' 2025-09-07T09:13:22.0412275Z #34 8347.3 adding 'vllm/model_executor/layers/pooler.py' 2025-09-07T09:13:22.0412749Z #34 8347.3 adding 'vllm/model_executor/layers/resampler.py' 2025-09-07T09:13:22.0413391Z #34 8347.3 adding 'vllm/model_executor/layers/sampler.py' 2025-09-07T09:13:22.0413871Z #34 8347.3 adding 'vllm/model_executor/layers/utils.py' 2025-09-07T09:13:22.0414415Z #34 8347.3 adding 'vllm/model_executor/layers/vocab_parallel_embedding.py' 2025-09-07T09:13:22.0415005Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/__init__.py' 2025-09-07T09:13:22.0415622Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/batched_deep_gemm_moe.py' 2025-09-07T09:13:22.0416344Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/batched_triton_or_deep_gemm_moe.py' 2025-09-07T09:13:22.0417017Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/config.py' 2025-09-07T09:13:22.0417602Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/cpu_fused_moe.py' 2025-09-07T09:13:22.0418190Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/cutlass_moe.py' 2025-09-07T09:13:22.0418826Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/deep_gemm_moe.py' 2025-09-07T09:13:22.0419429Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/deep_gemm_utils.py' 2025-09-07T09:13:22.0420110Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/deepep_ht_prepare_finalize.py' 2025-09-07T09:13:22.0420839Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/deepep_ll_prepare_finalize.py' 2025-09-07T09:13:22.0421554Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/flashinfer_cutlass_moe.py' 2025-09-07T09:13:22.0422316Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py' 2025-09-07T09:13:22.0423043Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/fused_batched_moe.py' 2025-09-07T09:13:22.0423717Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/fused_marlin_moe.py' 2025-09-07T09:13:22.0424307Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/fused_moe.py' 2025-09-07T09:13:22.1320844Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/gpt_oss_triton_kernels_moe.py' 2025-09-07T09:13:22.1321832Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/layer.py' 2025-09-07T09:13:22.1322541Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/modular_kernel.py' 2025-09-07T09:13:22.1323182Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/moe_align_block_size.py' 2025-09-07T09:13:22.1323939Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/moe_pallas.py' 2025-09-07T09:13:22.1324553Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/moe_permute_unpermute.py' 2025-09-07T09:13:22.1325229Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/moe_torch_iterative.py' 2025-09-07T09:13:22.1325891Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/pplx_prepare_finalize.py' 2025-09-07T09:13:22.1326530Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/prepare_finalize.py' 2025-09-07T09:13:22.1327165Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/rocm_aiter_fused_moe.py' 2025-09-07T09:13:22.1327805Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/routing_simulator.py' 2025-09-07T09:13:22.1328455Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/topk_weight_and_reduce.py' 2025-09-07T09:13:22.1329114Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/triton_deep_gemm_moe.py' 2025-09-07T09:13:22.1329775Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/trtllm_moe.py' 2025-09-07T09:13:22.1330327Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/utils.py' 2025-09-07T09:13:22.1331142Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T09:13:22.1332283Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T09:13:22.1333510Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T09:13:22.1334564Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T09:13:22.1335624Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T09:13:22.1336749Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json' 2025-09-07T09:13:22.1337783Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=3072,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T09:13:22.1338830Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T09:13:22.1339873Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T09:13:22.1340907Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T09:13:22.1342023Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=1,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T09:13:22.1343051Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=1024,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1344106Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=1024,device_name=AMD_Instinct_MI300X.json' 2025-09-07T09:13:22.1345201Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T09:13:22.1346150Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T09:13:22.1347107Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H20-3e.json' 2025-09-07T09:13:22.1347984Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H20.json' 2025-09-07T09:13:22.1348854Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=192,device_name=NVIDIA_H200.json' 2025-09-07T09:13:22.1349817Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=352,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1351020Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1352236Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1353399Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1354446Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20-3e.json' 2025-09-07T09:13:22.1355320Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H20.json' 2025-09-07T09:13:22.1356340Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1357392Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=384,device_name=NVIDIA_H200.json' 2025-09-07T09:13:22.1358294Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=512,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T09:13:22.1359271Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=704,device_name=NVIDIA_B200,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1360295Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=704,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1361466Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1362681Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1363837Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1365074Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1366316Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H20.json' 2025-09-07T09:13:22.1367391Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1368484Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_H200.json' 2025-09-07T09:13:22.1369448Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=128,N=96,device_name=NVIDIA_H20.json' 2025-09-07T09:13:22.1370395Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=AMD_Instinct_MI300X.json' 2025-09-07T09:13:22.1371425Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_B200,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1372426Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_B200.json' 2025-09-07T09:13:22.1373613Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=1024,device_name=NVIDIA_H100.json' 2025-09-07T09:13:22.1374540Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-40GB.json' 2025-09-07T09:13:22.1375636Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T09:13:22.1376690Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=1344,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T09:13:22.1377871Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T09:13:22.1378947Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=14336,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T09:13:22.1380169Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T09:13:22.1381287Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T09:13:22.1382342Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T09:13:22.1383368Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=2688,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T09:13:22.1384490Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=3072,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T09:13:22.1385797Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=3072,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json' 2025-09-07T09:13:22.1386913Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=3200,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1388124Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T09:13:22.1389193Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T09:13:22.1390261Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=6400,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1391394Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a16.json' 2025-09-07T09:13:22.1392827Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T09:13:22.1393938Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=int8_w8a16.json' 2025-09-07T09:13:22.1395176Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=16,N=800,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1396337Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=160,N=192,device_name=NVIDIA_A800-SXM4-80GB.json' 2025-09-07T09:13:22.1397280Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=160,N=192,device_name=NVIDIA_H20-3e.json' 2025-09-07T09:13:22.1398284Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=160,N=320,device_name=NVIDIA_H20-3e.json' 2025-09-07T09:13:22.1399425Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1400794Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1402144Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=160,N=640,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1403409Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1404792Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1406190Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1407415Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=20,N=2560,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1408644Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=1024,device_name=AMD_Instinct_MI325X,block_shape=[128,128].json' 2025-09-07T09:13:22.1409834Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=1024,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1411290Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1412774Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8.json' 2025-09-07T09:13:22.1414045Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1415310Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8.json' 2025-09-07T09:13:22.1416583Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1417994Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1419258Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1420564Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1421983Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1423338Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1424650Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1426050Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1427267Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1428500Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H20-3e,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1429804Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1431082Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1432365Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1433544Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=256,N=64,device_name=NVIDIA_A800-SXM4-80GB.json' 2025-09-07T09:13:22.1434660Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1435953Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_GB200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1437219Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=40,N=2560,device_name=NVIDIA_H100,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1438330Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=60,N=1408,device_name=AMD_Instinct_MI300X.json' 2025-09-07T09:13:22.1439316Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=60,N=176,device_name=AMD_Instinct_MI300X.json' 2025-09-07T09:13:22.1440297Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=60,N=352,device_name=AMD_Instinct_MI300X.json' 2025-09-07T09:13:22.1441318Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=60,N=704,device_name=AMD_Instinct_MI300X.json' 2025-09-07T09:13:22.1442330Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=62,N=256,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T09:13:22.1443369Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=62,N=512,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T09:13:22.1444325Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T09:13:22.1445395Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_A800-SXM4-80GB.json' 2025-09-07T09:13:22.1446463Z #34 8347.3 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1447528Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T09:13:22.1448582Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1449577Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=1280,device_name=NVIDIA_H200.json' 2025-09-07T09:13:22.1450553Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=1536,device_name=NVIDIA_H20,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1451627Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1452982Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1453996Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=2560,device_name=NVIDIA_H200.json' 2025-09-07T09:13:22.1455013Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=3072,device_name=NVIDIA_H20,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1456030Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=3072,device_name=NVIDIA_H20.json' 2025-09-07T09:13:22.1457069Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1458170Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T09:13:22.1459211Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1460280Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=320,device_name=NVIDIA_H200.json' 2025-09-07T09:13:22.1461288Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=384,device_name=NVIDIA_H20,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1462287Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=384,device_name=NVIDIA_H20.json' 2025-09-07T09:13:22.1463287Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T09:13:22.1464311Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_A800-SXM4-80GB.json' 2025-09-07T09:13:22.1465497Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1466671Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1467719Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T09:13:22.1468738Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1469761Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=640,device_name=NVIDIA_H200.json' 2025-09-07T09:13:22.1471099Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=NVIDIA_H20,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1472087Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=768,device_name=NVIDIA_H20.json' 2025-09-07T09:13:22.1472981Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=64,N=896,device_name=NVIDIA_H20.json' 2025-09-07T09:13:22.1473935Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=72,N=384,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T09:13:22.1474911Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=72,N=768,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T09:13:22.1475984Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1477068Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI300X.json' 2025-09-07T09:13:22.1478116Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1479229Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=AMD_Instinct_MI325X.json' 2025-09-07T09:13:22.1480227Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1481367Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1482341Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=14336,device_name=NVIDIA_H200.json' 2025-09-07T09:13:22.1483351Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1484402Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI300X.json' 2025-09-07T09:13:22.1485456Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1486495Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=16384,device_name=AMD_Instinct_MI325X.json' 2025-09-07T09:13:22.1487539Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1488621Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI300X.json' 2025-09-07T09:13:22.1489653Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1490758Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=AMD_Instinct_MI325X.json' 2025-09-07T09:13:22.1491680Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-40GB.json' 2025-09-07T09:13:22.1493138Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T09:13:22.1494091Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T09:13:22.1495137Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1496227Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=1792,device_name=NVIDIA_H200.json' 2025-09-07T09:13:22.1497267Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1498354Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI300X.json' 2025-09-07T09:13:22.1499412Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1500543Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=AMD_Instinct_MI325X.json' 2025-09-07T09:13:22.1501630Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T09:13:22.1502697Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1503790Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T09:13:22.1505033Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.1506152Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1507160Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H200.json' 2025-09-07T09:13:22.1508170Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1509208Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI300X.json' 2025-09-07T09:13:22.1510264Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1511290Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=AMD_Instinct_MI325X.json' 2025-09-07T09:13:22.1512289Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-40GB.json' 2025-09-07T09:13:22.1513270Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T09:13:22.1514346Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1515503Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1516518Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T09:13:22.1517574Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1518523Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_H200.json' 2025-09-07T09:13:22.1519504Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=3584,device_name=NVIDIA_L40S.json' 2025-09-07T09:13:22.1520510Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1521553Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI300X.json' 2025-09-07T09:13:22.1522595Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1523621Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=AMD_Instinct_MI325X.json' 2025-09-07T09:13:22.1524607Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T09:13:22.1525704Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1526752Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T09:13:22.1527824Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1528827Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=4096,device_name=NVIDIA_H200.json' 2025-09-07T09:13:22.1529840Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1530866Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI300X.json' 2025-09-07T09:13:22.1531906Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1533294Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=AMD_Instinct_MI325X.json' 2025-09-07T09:13:22.1534262Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_A100-SXM4-80GB.json' 2025-09-07T09:13:22.1535336Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1536493Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H100_80GB_HBM3.json' 2025-09-07T09:13:22.1537583Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1538534Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=7168,device_name=NVIDIA_H200.json' 2025-09-07T09:13:22.1539637Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1540702Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI300X.json' 2025-09-07T09:13:22.1541791Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1542852Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=AMD_Instinct_MI325X.json' 2025-09-07T09:13:22.1543946Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1545205Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H200,dtype=fp8_w8a8.json' 2025-09-07T09:13:22.1545962Z #34 8347.4 adding 'vllm/model_executor/layers/fused_moe/configs/README' 2025-09-07T09:13:22.1546557Z #34 8347.4 adding 'vllm/model_executor/layers/mamba/__init__.py' 2025-09-07T09:13:22.1547063Z #34 8347.4 adding 'vllm/model_executor/layers/mamba/abstract.py' 2025-09-07T09:13:22.1547646Z #34 8347.4 adding 'vllm/model_executor/layers/mamba/linear_attn.py' 2025-09-07T09:13:22.1548236Z #34 8347.4 adding 'vllm/model_executor/layers/mamba/mamba2_metadata.py' 2025-09-07T09:13:22.1548831Z #34 8347.4 adding 'vllm/model_executor/layers/mamba/mamba_mixer.py' 2025-09-07T09:13:22.1549424Z #34 8347.4 adding 'vllm/model_executor/layers/mamba/mamba_mixer2.py' 2025-09-07T09:13:22.1549950Z #34 8347.4 adding 'vllm/model_executor/layers/mamba/mamba_utils.py' 2025-09-07T09:13:22.1550535Z #34 8347.4 adding 'vllm/model_executor/layers/mamba/short_conv.py' 2025-09-07T09:13:22.1551107Z #34 8347.4 adding 'vllm/model_executor/layers/mamba/ops/__init__.py' 2025-09-07T09:13:22.1551673Z #34 8347.4 adding 'vllm/model_executor/layers/mamba/ops/causal_conv1d.py' 2025-09-07T09:13:22.1552317Z #34 8347.4 adding 'vllm/model_executor/layers/mamba/ops/layernorm_gated.py' 2025-09-07T09:13:22.1552930Z #34 8347.4 adding 'vllm/model_executor/layers/mamba/ops/mamba_ssm.py' 2025-09-07T09:13:22.1553536Z #34 8347.4 adding 'vllm/model_executor/layers/mamba/ops/ssd_bmm.py' 2025-09-07T09:13:22.1554082Z #34 8347.4 adding 'vllm/model_executor/layers/mamba/ops/ssd_chunk_scan.py' 2025-09-07T09:13:22.1554746Z #34 8347.4 adding 'vllm/model_executor/layers/mamba/ops/ssd_chunk_state.py' 2025-09-07T09:13:22.1555397Z #34 8347.4 adding 'vllm/model_executor/layers/mamba/ops/ssd_combined.py' 2025-09-07T09:13:22.1556011Z #34 8347.4 adding 'vllm/model_executor/layers/mamba/ops/ssd_state_passing.py' 2025-09-07T09:13:22.1556719Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/__init__.py' 2025-09-07T09:13:22.1557359Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/auto_round.py' 2025-09-07T09:13:22.1557933Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/awq.py' 2025-09-07T09:13:22.1558538Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/awq_marlin.py' 2025-09-07T09:13:22.1559202Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/awq_triton.py' 2025-09-07T09:13:22.1559791Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/base_config.py' 2025-09-07T09:13:22.1560446Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/bitblas.py' 2025-09-07T09:13:22.1561100Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/bitsandbytes.py' 2025-09-07T09:13:22.1561697Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/deepgemm.py' 2025-09-07T09:13:22.1562352Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/deepspeedfp.py' 2025-09-07T09:13:22.1563010Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/experts_int8.py' 2025-09-07T09:13:22.1563649Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/fbgemm_fp8.py' 2025-09-07T09:13:22.1564241Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/fp8.py' 2025-09-07T09:13:22.1564797Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/gguf.py' 2025-09-07T09:13:22.1565414Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/gptq.py' 2025-09-07T09:13:22.1566027Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/gptq_bitblas.py' 2025-09-07T09:13:22.1566653Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/gptq_marlin.py' 2025-09-07T09:13:22.1567329Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/gptq_marlin_24.py' 2025-09-07T09:13:22.1568001Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/hqq_marlin.py' 2025-09-07T09:13:22.1568579Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/inc.py' 2025-09-07T09:13:22.1569220Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/input_quant_fp8.py' 2025-09-07T09:13:22.1569857Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/ipex_quant.py' 2025-09-07T09:13:22.1570466Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kv_cache.py' 2025-09-07T09:13:22.1571112Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/modelopt.py' 2025-09-07T09:13:22.1571688Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/moe_wna16.py' 2025-09-07T09:13:22.1572419Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/mxfp4.py' 2025-09-07T09:13:22.1573285Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/petit.py' 2025-09-07T09:13:22.1573862Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/ptpc_fp8.py' 2025-09-07T09:13:22.1574491Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/rtn.py' 2025-09-07T09:13:22.1575108Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/schema.py' 2025-09-07T09:13:22.1575701Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/torchao.py' 2025-09-07T09:13:22.1576326Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/tpu_int8.py' 2025-09-07T09:13:22.1577071Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/__init__.py' 2025-09-07T09:13:22.1577937Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py' 2025-09-07T09:13:22.1578954Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py' 2025-09-07T09:13:22.1579938Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/triton_scaled_mm.py' 2025-09-07T09:13:22.1580800Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/utils.py' 2025-09-07T09:13:22.1581688Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/__init__.py' 2025-09-07T09:13:22.1582701Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_24.py' 2025-09-07T09:13:22.1583843Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_scheme.py' 2025-09-07T09:13:22.1585064Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_24.py' 2025-09-07T09:13:22.1586186Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_nvfp4.py' 2025-09-07T09:13:22.1587387Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a4_nvfp4.py' 2025-09-07T09:13:22.1588508Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a8_fp8.py' 2025-09-07T09:13:22.1589602Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a8_int.py' 2025-09-07T09:13:22.1590747Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a16_fp8.py' 2025-09-07T09:13:22.1591851Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_fp8.py' 2025-09-07T09:13:22.1593373Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8.py' 2025-09-07T09:13:22.1594454Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_wNa16.py' 2025-09-07T09:13:22.1595518Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/transform/linear.py' 2025-09-07T09:13:22.1596417Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/transform/module.py' 2025-09-07T09:13:22.1597333Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/transform/utils.py' 2025-09-07T09:13:22.1598391Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/compressed_tensors/transform/schemes/linear_qutlass_nvfp4.py' 2025-09-07T09:13:22.1599348Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kernels/__init__.py' 2025-09-07T09:13:22.1600204Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/MPLinearKernel.py' 2025-09-07T09:13:22.1601145Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/__init__.py' 2025-09-07T09:13:22.1602049Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/allspark.py' 2025-09-07T09:13:22.1602953Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/bitblas.py' 2025-09-07T09:13:22.1603930Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/conch.py' 2025-09-07T09:13:22.1604899Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/cutlass.py' 2025-09-07T09:13:22.1605803Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/dynamic_4bit.py' 2025-09-07T09:13:22.1606636Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/exllama.py' 2025-09-07T09:13:22.1607555Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/machete.py' 2025-09-07T09:13:22.1608362Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kernels/mixed_precision/marlin.py' 2025-09-07T09:13:22.1609254Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kernels/scaled_mm/ScaledMMLinearKernel.py' 2025-09-07T09:13:22.1610213Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kernels/scaled_mm/__init__.py' 2025-09-07T09:13:22.1610996Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kernels/scaled_mm/aiter.py' 2025-09-07T09:13:22.1611771Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kernels/scaled_mm/cpu.py' 2025-09-07T09:13:22.1612562Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kernels/scaled_mm/cutlass.py' 2025-09-07T09:13:22.1613549Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kernels/scaled_mm/triton.py' 2025-09-07T09:13:22.1614417Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/kernels/scaled_mm/xla.py' 2025-09-07T09:13:22.1615200Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/quark/__init__.py' 2025-09-07T09:13:22.1615881Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/quark/quark.py' 2025-09-07T09:13:22.1616517Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/quark/quark_moe.py' 2025-09-07T09:13:22.1617203Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/quark/utils.py' 2025-09-07T09:13:22.1617909Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/quark/schemes/__init__.py' 2025-09-07T09:13:22.1618685Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/quark/schemes/quark_scheme.py' 2025-09-07T09:13:22.1619543Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/quark/schemes/quark_w4a4_mxfp4.py' 2025-09-07T09:13:22.1620412Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_fp8.py' 2025-09-07T09:13:22.1621324Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_int8.py' 2025-09-07T09:13:22.1622128Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/__init__.py' 2025-09-07T09:13:22.1622801Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/allspark_utils.py' 2025-09-07T09:13:22.1623578Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/bitblas_utils.py' 2025-09-07T09:13:22.1624363Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py' 2025-09-07T09:13:22.1625273Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/flashinfer_utils.py' 2025-09-07T09:13:22.1626001Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/fp8_utils.py' 2025-09-07T09:13:22.1626677Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/gptq_utils.py' 2025-09-07T09:13:22.1627344Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/int8_utils.py' 2025-09-07T09:13:22.1628048Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/layer_utils.py' 2025-09-07T09:13:22.1628762Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/machete_utils.py' 2025-09-07T09:13:22.1629472Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/marlin_utils.py' 2025-09-07T09:13:22.1630158Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/marlin_utils_fp4.py' 2025-09-07T09:13:22.1630910Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/marlin_utils_fp8.py' 2025-09-07T09:13:22.2321947Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/marlin_utils_test.py' 2025-09-07T09:13:22.2322885Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/marlin_utils_test_24.py' 2025-09-07T09:13:22.2323591Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/mxfp4_utils.py' 2025-09-07T09:13:22.2324308Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/mxfp8_utils.py' 2025-09-07T09:13:22.2325036Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/nvfp4_emulation_utils.py' 2025-09-07T09:13:22.2325789Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/nvfp4_moe_support.py' 2025-09-07T09:13:22.2326472Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/petit_utils.py' 2025-09-07T09:13:22.2327130Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/quant_utils.py' 2025-09-07T09:13:22.2327830Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/w8a8_utils.py' 2025-09-07T09:13:22.2328881Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=12288,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2330283Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=12288,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2331726Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2333442Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2334907Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2336383Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2337864Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2339370Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2340785Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2342159Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=1536,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2343567Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2345131Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2346545Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2347968Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2349401Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2350821Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2352183Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2353547Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2354865Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2356186Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=1536,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2357553Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2358970Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2360628Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2362091Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2363586Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2365047Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2366453Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2367800Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2369169Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2048,K=512,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2370597Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2112,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2372022Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2112,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2373596Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2375037Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2376501Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2377982Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2379445Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2380909Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2382326Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2383842Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2385176Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=2304,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2386561Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=1536,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2387933Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=1536,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2389307Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2390739Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2392339Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2394078Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2395559Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2396995Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2398431Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2399849Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2401231Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2402658Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2404029Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2405492Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=24576,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2406857Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2408248Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2409654Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2411059Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2412743Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2414206Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2415645Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2416995Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=256,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2418405Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2419833Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2421335Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2422763Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2424124Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2425603Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2426918Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=1536,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2428279Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2429683Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2431087Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2432467Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2433872Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2435234Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2436572Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2437901Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=3072,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2439252Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2440660Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2442058Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2443488Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2444922Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2446328Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2447705Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2449078Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2450408Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2451797Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2453413Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2454775Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=32768,K=512,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2456241Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2457688Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2459163Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2460801Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2462221Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=36864,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2463679Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2465228Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2466616Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2467994Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2469358Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2470711Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2472038Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2473453Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=512,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2474980Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2476358Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4096,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2477890Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2479305Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2480723Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2482161Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2483646Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2485021Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2486339Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2487760Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=4608,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2489179Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2490568Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2492494Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2493907Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2495345Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2496701Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2498042Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=512,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2499443Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2500884Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2502391Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2503937Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2505489Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2506849Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2508263Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2509736Z #34 8347.4 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2511052Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2512410Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2513723Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2515094Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=576,K=7168,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2516453Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2517854Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2519303Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2520737Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2522169Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2523578Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2525046Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2526566Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2527895Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1024,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2529263Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2530731Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2532277Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2533932Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2535394Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2536860Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2538278Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2539817Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2541190Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=1152,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2542667Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2544162Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2545699Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2547162Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2548572Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2550022Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2551372Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2552690Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=128,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2554060Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2555457Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2556883Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2558352Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2559782Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2561288Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2562816Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2564191Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2565526Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2566865Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2568192Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2569661Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=16384,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2571094Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2572763Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2574246Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2575712Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_A100-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2577245Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_A800-SXM4-80GB,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2578762Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2580247Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2581709Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2583091Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2584571Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2585976Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2587376Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=18432,device_name=NVIDIA_L20Y,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2588792Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2590203Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2591608Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2593323Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2594815Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2596303Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2597681Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2599059Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2048,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2600456Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2601915Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2603437Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2604877Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2606483Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2607849Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2609244Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2610642Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=2304,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2612063Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2613812Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2615400Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2616804Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_B200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2618170Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_H20,dtype=int8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2619636Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_H200,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2621055Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=256,device_name=NVIDIA_L20,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2622464Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2623903Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2625456Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=7168,K=8192,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2626874Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2628267Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI325X,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2629796Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/N=8192,K=1536,device_name=AMD_Instinct_MI325_OAM,dtype=fp8_w8a8,block_shape=[128,128].json' 2025-09-07T09:13:22.2630864Z #34 8347.5 adding 'vllm/model_executor/layers/quantization/utils/configs/README.md' 2025-09-07T09:13:22.2631518Z #34 8347.5 adding 'vllm/model_executor/layers/rotary_embedding/__init__.py' 2025-09-07T09:13:22.2632125Z #34 8347.5 adding 'vllm/model_executor/layers/rotary_embedding/base.py' 2025-09-07T09:13:22.2632695Z #34 8347.5 adding 'vllm/model_executor/layers/rotary_embedding/common.py' 2025-09-07T09:13:22.2633404Z #34 8347.5 adding 'vllm/model_executor/layers/rotary_embedding/deepseek_scaling_rope.py' 2025-09-07T09:13:22.2634100Z #34 8347.5 adding 'vllm/model_executor/layers/rotary_embedding/dual_chunk_rope.py' 2025-09-07T09:13:22.2634804Z #34 8347.5 adding 'vllm/model_executor/layers/rotary_embedding/dynamic_ntk_alpha_rope.py' 2025-09-07T09:13:22.2635562Z #34 8347.5 adding 'vllm/model_executor/layers/rotary_embedding/dynamic_ntk_scaling_rope.py' 2025-09-07T09:13:22.2636268Z #34 8347.5 adding 'vllm/model_executor/layers/rotary_embedding/ernie45_vl_rope.py' 2025-09-07T09:13:22.2636962Z #34 8347.5 adding 'vllm/model_executor/layers/rotary_embedding/linear_scaling_rope.py' 2025-09-07T09:13:22.2637686Z #34 8347.5 adding 'vllm/model_executor/layers/rotary_embedding/llama3_rope.py' 2025-09-07T09:13:22.2638462Z #34 8347.5 adding 'vllm/model_executor/layers/rotary_embedding/llama4_vision_rope.py' 2025-09-07T09:13:22.2639115Z #34 8347.5 adding 'vllm/model_executor/layers/rotary_embedding/mrope.py' 2025-09-07T09:13:22.2639741Z #34 8347.5 adding 'vllm/model_executor/layers/rotary_embedding/ntk_scaling_rope.py' 2025-09-07T09:13:22.2640478Z #34 8347.5 adding 'vllm/model_executor/layers/rotary_embedding/phi3_long_rope_scaled_rope.py' 2025-09-07T09:13:22.2641206Z #34 8347.5 adding 'vllm/model_executor/layers/rotary_embedding/yarn_scaling_rope.py' 2025-09-07T09:13:22.2641865Z #34 8347.5 adding 'vllm/model_executor/layers/shared_fused_moe/__init__.py' 2025-09-07T09:13:22.2642529Z #34 8347.5 adding 'vllm/model_executor/layers/shared_fused_moe/shared_fused_moe.py' 2025-09-07T09:13:22.2643131Z #34 8347.5 adding 'vllm/model_executor/model_loader/__init__.py' 2025-09-07T09:13:22.2643660Z #34 8347.5 adding 'vllm/model_executor/model_loader/base_loader.py' 2025-09-07T09:13:22.2644227Z #34 8347.5 adding 'vllm/model_executor/model_loader/bitsandbytes_loader.py' 2025-09-07T09:13:22.2644889Z #34 8347.5 adding 'vllm/model_executor/model_loader/default_loader.py' 2025-09-07T09:13:22.2645505Z #34 8347.5 adding 'vllm/model_executor/model_loader/dummy_loader.py' 2025-09-07T09:13:22.2646056Z #34 8347.5 adding 'vllm/model_executor/model_loader/gguf_loader.py' 2025-09-07T09:13:22.2646640Z #34 8347.5 adding 'vllm/model_executor/model_loader/runai_streamer_loader.py' 2025-09-07T09:13:22.2647251Z #34 8347.5 adding 'vllm/model_executor/model_loader/sharded_state_loader.py' 2025-09-07T09:13:22.2647834Z #34 8347.5 adding 'vllm/model_executor/model_loader/tensorizer.py' 2025-09-07T09:13:22.2648423Z #34 8347.5 adding 'vllm/model_executor/model_loader/tensorizer_loader.py' 2025-09-07T09:13:22.2648967Z #34 8347.5 adding 'vllm/model_executor/model_loader/tpu.py' 2025-09-07T09:13:22.2649441Z #34 8347.5 adding 'vllm/model_executor/model_loader/utils.py' 2025-09-07T09:13:22.2649961Z #34 8347.5 adding 'vllm/model_executor/model_loader/weight_utils.py' 2025-09-07T09:13:22.2650475Z #34 8347.5 adding 'vllm/model_executor/models/__init__.py' 2025-09-07T09:13:22.2650934Z #34 8347.5 adding 'vllm/model_executor/models/adapters.py' 2025-09-07T09:13:22.2651398Z #34 8347.5 adding 'vllm/model_executor/models/aimv2.py' 2025-09-07T09:13:22.2651838Z #34 8347.5 adding 'vllm/model_executor/models/apertus.py' 2025-09-07T09:13:22.2652367Z #34 8347.5 adding 'vllm/model_executor/models/arcee.py' 2025-09-07T09:13:22.2652987Z #34 8347.5 adding 'vllm/model_executor/models/arctic.py' 2025-09-07T09:13:22.2653469Z #34 8347.5 adding 'vllm/model_executor/models/aria.py' 2025-09-07T09:13:22.2653936Z #34 8347.5 adding 'vllm/model_executor/models/aya_vision.py' 2025-09-07T09:13:22.2654442Z #34 8347.5 adding 'vllm/model_executor/models/baichuan.py' 2025-09-07T09:13:22.2654937Z #34 8347.5 adding 'vllm/model_executor/models/bailing_moe.py' 2025-09-07T09:13:22.2655405Z #34 8347.5 adding 'vllm/model_executor/models/bamba.py' 2025-09-07T09:13:22.2655856Z #34 8347.5 adding 'vllm/model_executor/models/bart.py' 2025-09-07T09:13:22.2656295Z #34 8347.5 adding 'vllm/model_executor/models/bert.py' 2025-09-07T09:13:22.2656844Z #34 8347.5 adding 'vllm/model_executor/models/bert_with_rope.py' 2025-09-07T09:13:22.2657394Z #34 8347.5 adding 'vllm/model_executor/models/blip.py' 2025-09-07T09:13:22.2657881Z #34 8347.5 adding 'vllm/model_executor/models/blip2.py' 2025-09-07T09:13:22.2658340Z #34 8347.5 adding 'vllm/model_executor/models/bloom.py' 2025-09-07T09:13:22.2658798Z #34 8347.5 adding 'vllm/model_executor/models/chameleon.py' 2025-09-07T09:13:22.2659292Z #34 8347.5 adding 'vllm/model_executor/models/chatglm.py' 2025-09-07T09:13:22.2659748Z #34 8347.5 adding 'vllm/model_executor/models/clip.py' 2025-09-07T09:13:22.2660236Z #34 8347.5 adding 'vllm/model_executor/models/cohere2_vision.py' 2025-09-07T09:13:22.2660732Z #34 8347.5 adding 'vllm/model_executor/models/commandr.py' 2025-09-07T09:13:22.2661210Z #34 8347.5 adding 'vllm/model_executor/models/config.py' 2025-09-07T09:13:22.2661730Z #34 8347.5 adding 'vllm/model_executor/models/constant_size_cache.py' 2025-09-07T09:13:22.2662307Z #34 8347.5 adding 'vllm/model_executor/models/dbrx.py' 2025-09-07T09:13:22.2662792Z #34 8347.5 adding 'vllm/model_executor/models/deepseek.py' 2025-09-07T09:13:22.2663362Z #34 8347.5 adding 'vllm/model_executor/models/deepseek_eagle.py' 2025-09-07T09:13:22.2663886Z #34 8347.5 adding 'vllm/model_executor/models/deepseek_mtp.py' 2025-09-07T09:13:22.2664384Z #34 8347.5 adding 'vllm/model_executor/models/deepseek_v2.py' 2025-09-07T09:13:22.2664894Z #34 8347.5 adding 'vllm/model_executor/models/deepseek_vl2.py' 2025-09-07T09:13:22.2665506Z #34 8347.5 adding 'vllm/model_executor/models/donut.py' 2025-09-07T09:13:22.2665949Z #34 8347.5 adding 'vllm/model_executor/models/dots1.py' 2025-09-07T09:13:22.2666397Z #34 8347.5 adding 'vllm/model_executor/models/ernie45.py' 2025-09-07T09:13:22.2666858Z #34 8347.5 adding 'vllm/model_executor/models/ernie45_moe.py' 2025-09-07T09:13:22.2667341Z #34 8347.5 adding 'vllm/model_executor/models/ernie45_vl.py' 2025-09-07T09:13:22.2667876Z #34 8347.5 adding 'vllm/model_executor/models/ernie45_vl_moe.py' 2025-09-07T09:13:22.2668404Z #34 8347.5 adding 'vllm/model_executor/models/ernie_mtp.py' 2025-09-07T09:13:22.2668888Z #34 8347.5 adding 'vllm/model_executor/models/exaone.py' 2025-09-07T09:13:22.2669345Z #34 8347.5 adding 'vllm/model_executor/models/exaone4.py' 2025-09-07T09:13:22.2669829Z #34 8347.5 adding 'vllm/model_executor/models/fairseq2_llama.py' 2025-09-07T09:13:22.2670301Z #34 8347.5 adding 'vllm/model_executor/models/falcon.py' 2025-09-07T09:13:22.2670817Z #34 8347.5 adding 'vllm/model_executor/models/falcon_h1.py' 2025-09-07T09:13:22.2671405Z #34 8347.5 adding 'vllm/model_executor/models/florence2.py' 2025-09-07T09:13:22.2671867Z #34 8347.5 adding 'vllm/model_executor/models/fuyu.py' 2025-09-07T09:13:22.2672291Z #34 8347.5 adding 'vllm/model_executor/models/gemma.py' 2025-09-07T09:13:22.2672730Z #34 8347.5 adding 'vllm/model_executor/models/gemma2.py' 2025-09-07T09:13:22.2673176Z #34 8347.5 adding 'vllm/model_executor/models/gemma3.py' 2025-09-07T09:13:22.2673628Z #34 8347.5 adding 'vllm/model_executor/models/gemma3_mm.py' 2025-09-07T09:13:22.2674093Z #34 8347.5 adding 'vllm/model_executor/models/gemma3n.py' 2025-09-07T09:13:22.2674613Z #34 8347.5 adding 'vllm/model_executor/models/gemma3n_mm.py' 2025-09-07T09:13:22.2675070Z #34 8347.5 adding 'vllm/model_executor/models/glm.py' 2025-09-07T09:13:22.2675556Z #34 8347.5 adding 'vllm/model_executor/models/glm4.py' 2025-09-07T09:13:22.2676000Z #34 8347.5 adding 'vllm/model_executor/models/glm4_1v.py' 2025-09-07T09:13:22.2676450Z #34 8347.5 adding 'vllm/model_executor/models/glm4_moe.py' 2025-09-07T09:13:22.3323142Z #34 8347.5 adding 'vllm/model_executor/models/glm4_moe_mtp.py' 2025-09-07T09:13:22.3323895Z #34 8347.5 adding 'vllm/model_executor/models/glm4v.py' 2025-09-07T09:13:22.3324381Z #34 8347.5 adding 'vllm/model_executor/models/gpt2.py' 2025-09-07T09:13:22.3324834Z #34 8347.5 adding 'vllm/model_executor/models/gpt_bigcode.py' 2025-09-07T09:13:22.3325310Z #34 8347.5 adding 'vllm/model_executor/models/gpt_j.py' 2025-09-07T09:13:22.3325750Z #34 8347.5 adding 'vllm/model_executor/models/gpt_neox.py' 2025-09-07T09:13:22.3326264Z #34 8347.5 adding 'vllm/model_executor/models/gpt_oss.py' 2025-09-07T09:13:22.3326853Z #34 8347.5 adding 'vllm/model_executor/models/granite.py' 2025-09-07T09:13:22.3327411Z #34 8347.5 adding 'vllm/model_executor/models/granite_speech.py' 2025-09-07T09:13:22.3327971Z #34 8347.5 adding 'vllm/model_executor/models/granitemoe.py' 2025-09-07T09:13:22.3328516Z #34 8347.5 adding 'vllm/model_executor/models/granitemoehybrid.py' 2025-09-07T09:13:22.3329073Z #34 8347.5 adding 'vllm/model_executor/models/granitemoeshared.py' 2025-09-07T09:13:22.3329626Z #34 8347.5 adding 'vllm/model_executor/models/gritlm.py' 2025-09-07T09:13:22.3330131Z #34 8347.5 adding 'vllm/model_executor/models/grok1.py' 2025-09-07T09:13:22.3330561Z #34 8347.5 adding 'vllm/model_executor/models/h2ovl.py' 2025-09-07T09:13:22.3331085Z #34 8347.5 adding 'vllm/model_executor/models/hunyuan_v1.py' 2025-09-07T09:13:22.3331737Z #34 8347.5 adding 'vllm/model_executor/models/hyperclovax_vision.py' 2025-09-07T09:13:22.3332444Z #34 8347.5 adding 'vllm/model_executor/models/idefics2_vision_model.py' 2025-09-07T09:13:22.3332972Z #34 8347.5 adding 'vllm/model_executor/models/idefics3.py' 2025-09-07T09:13:22.3333691Z #34 8347.5 adding 'vllm/model_executor/models/interfaces.py' 2025-09-07T09:13:22.3334271Z #34 8347.5 adding 'vllm/model_executor/models/interfaces_base.py' 2025-09-07T09:13:22.3334821Z #34 8347.5 adding 'vllm/model_executor/models/intern_vit.py' 2025-09-07T09:13:22.3335402Z #34 8347.5 adding 'vllm/model_executor/models/internlm2.py' 2025-09-07T09:13:22.3335972Z #34 8347.5 adding 'vllm/model_executor/models/internlm2_ve.py' 2025-09-07T09:13:22.3336534Z #34 8347.5 adding 'vllm/model_executor/models/interns1.py' 2025-09-07T09:13:22.3337018Z #34 8347.5 adding 'vllm/model_executor/models/interns1_vit.py' 2025-09-07T09:13:22.3337573Z #34 8347.5 adding 'vllm/model_executor/models/internvl.py' 2025-09-07T09:13:22.3338102Z #34 8347.5 adding 'vllm/model_executor/models/jais.py' 2025-09-07T09:13:22.3338537Z #34 8347.5 adding 'vllm/model_executor/models/jamba.py' 2025-09-07T09:13:22.3339071Z #34 8347.5 adding 'vllm/model_executor/models/jina_vl.py' 2025-09-07T09:13:22.3339585Z #34 8347.5 adding 'vllm/model_executor/models/keye.py' 2025-09-07T09:13:22.3340055Z #34 8347.5 adding 'vllm/model_executor/models/keye_vl1_5.py' 2025-09-07T09:13:22.3340606Z #34 8347.5 adding 'vllm/model_executor/models/kimi_vl.py' 2025-09-07T09:13:22.3341120Z #34 8347.5 adding 'vllm/model_executor/models/lfm2.py' 2025-09-07T09:13:22.3341630Z #34 8347.5 adding 'vllm/model_executor/models/llama.py' 2025-09-07T09:13:22.3342159Z #34 8347.5 adding 'vllm/model_executor/models/llama4.py' 2025-09-07T09:13:22.3342705Z #34 8347.5 adding 'vllm/model_executor/models/llama4_eagle.py' 2025-09-07T09:13:22.3343209Z #34 8347.5 adding 'vllm/model_executor/models/llama_eagle.py' 2025-09-07T09:13:22.3343890Z #34 8347.5 adding 'vllm/model_executor/models/llama_eagle3.py' 2025-09-07T09:13:22.3344416Z #34 8347.6 adding 'vllm/model_executor/models/llava.py' 2025-09-07T09:13:22.3344943Z #34 8347.6 adding 'vllm/model_executor/models/llava_next.py' 2025-09-07T09:13:22.3345449Z #34 8347.6 adding 'vllm/model_executor/models/llava_next_video.py' 2025-09-07T09:13:22.3346034Z #34 8347.6 adding 'vllm/model_executor/models/llava_onevision.py' 2025-09-07T09:13:22.3346581Z #34 8347.6 adding 'vllm/model_executor/models/mamba.py' 2025-09-07T09:13:22.3347013Z #34 8347.6 adding 'vllm/model_executor/models/mamba2.py' 2025-09-07T09:13:22.3347553Z #34 8347.6 adding 'vllm/model_executor/models/mamba_cache.py' 2025-09-07T09:13:22.3348071Z #34 8347.6 adding 'vllm/model_executor/models/medusa.py' 2025-09-07T09:13:22.3348536Z #34 8347.6 adding 'vllm/model_executor/models/midashenglm.py' 2025-09-07T09:13:22.3349068Z #34 8347.6 adding 'vllm/model_executor/models/mimo.py' 2025-09-07T09:13:22.3349570Z #34 8347.6 adding 'vllm/model_executor/models/mimo_mtp.py' 2025-09-07T09:13:22.3350036Z #34 8347.6 adding 'vllm/model_executor/models/minicpm.py' 2025-09-07T09:13:22.3350555Z #34 8347.6 adding 'vllm/model_executor/models/minicpm3.py' 2025-09-07T09:13:22.3351110Z #34 8347.6 adding 'vllm/model_executor/models/minicpm_eagle.py' 2025-09-07T09:13:22.3351635Z #34 8347.6 adding 'vllm/model_executor/models/minicpmo.py' 2025-09-07T09:13:22.3352166Z #34 8347.6 adding 'vllm/model_executor/models/minicpmv.py' 2025-09-07T09:13:22.3352714Z #34 8347.6 adding 'vllm/model_executor/models/minimax_cache.py' 2025-09-07T09:13:22.3353248Z #34 8347.6 adding 'vllm/model_executor/models/minimax_text_01.py' 2025-09-07T09:13:22.3353800Z #34 8347.6 adding 'vllm/model_executor/models/minimax_vl_01.py' 2025-09-07T09:13:22.3354344Z #34 8347.6 adding 'vllm/model_executor/models/mistral3.py' 2025-09-07T09:13:22.3354870Z #34 8347.6 adding 'vllm/model_executor/models/mixtral.py' 2025-09-07T09:13:22.3355330Z #34 8347.6 adding 'vllm/model_executor/models/mixtral_quant.py' 2025-09-07T09:13:22.3355862Z #34 8347.6 adding 'vllm/model_executor/models/mllama.py' 2025-09-07T09:13:22.3356067Z #34 8347.6 adding 'vllm/model_executor/models/mllama4.py' 2025-09-07T09:13:22.3356292Z #34 8347.6 adding 'vllm/model_executor/models/mlp_speculator.py' 2025-09-07T09:13:22.3356507Z #34 8347.6 adding 'vllm/model_executor/models/modernbert.py' 2025-09-07T09:13:22.3356697Z #34 8347.6 adding 'vllm/model_executor/models/module_mapping.py' 2025-09-07T09:13:22.3356851Z #34 8347.6 adding 'vllm/model_executor/models/molmo.py' 2025-09-07T09:13:22.3357043Z #34 8347.6 adding 'vllm/model_executor/models/moonvit.py' 2025-09-07T09:13:22.3357238Z #34 8347.6 adding 'vllm/model_executor/models/mpt.py' 2025-09-07T09:13:22.3357440Z #34 8347.6 adding 'vllm/model_executor/models/nemotron.py' 2025-09-07T09:13:22.3357628Z #34 8347.6 adding 'vllm/model_executor/models/nemotron_h.py' 2025-09-07T09:13:22.3357837Z #34 8347.6 adding 'vllm/model_executor/models/nemotron_nas.py' 2025-09-07T09:13:22.3358045Z #34 8347.6 adding 'vllm/model_executor/models/nemotron_vl.py' 2025-09-07T09:13:22.3358219Z #34 8347.6 adding 'vllm/model_executor/models/nvlm_d.py' 2025-09-07T09:13:22.3358371Z #34 8347.6 adding 'vllm/model_executor/models/olmo.py' 2025-09-07T09:13:22.3358520Z #34 8347.6 adding 'vllm/model_executor/models/olmo2.py' 2025-09-07T09:13:22.3358741Z #34 8347.6 adding 'vllm/model_executor/models/olmoe.py' 2025-09-07T09:13:22.3358906Z #34 8347.6 adding 'vllm/model_executor/models/opt.py' 2025-09-07T09:13:22.3359058Z #34 8347.6 adding 'vllm/model_executor/models/orion.py' 2025-09-07T09:13:22.3359240Z #34 8347.6 adding 'vllm/model_executor/models/ovis.py' 2025-09-07T09:13:22.3359505Z #34 8347.6 adding 'vllm/model_executor/models/ovis2_5.py' 2025-09-07T09:13:22.3359682Z #34 8347.6 adding 'vllm/model_executor/models/paligemma.py' 2025-09-07T09:13:22.3359852Z #34 8347.6 adding 'vllm/model_executor/models/persimmon.py' 2025-09-07T09:13:22.3360014Z #34 8347.6 adding 'vllm/model_executor/models/phi.py' 2025-09-07T09:13:22.3360226Z #34 8347.6 adding 'vllm/model_executor/models/phi3.py' 2025-09-07T09:13:22.3360383Z #34 8347.6 adding 'vllm/model_executor/models/phi3v.py' 2025-09-07T09:13:22.3360589Z #34 8347.6 adding 'vllm/model_executor/models/phi4_multimodal.py' 2025-09-07T09:13:22.3360761Z #34 8347.6 adding 'vllm/model_executor/models/phi4flash.py' 2025-09-07T09:13:22.3360979Z #34 8347.6 adding 'vllm/model_executor/models/phi4mm.py' 2025-09-07T09:13:22.3361167Z #34 8347.6 adding 'vllm/model_executor/models/phi4mm_audio.py' 2025-09-07T09:13:22.3361345Z #34 8347.6 adding 'vllm/model_executor/models/phi4mm_utils.py' 2025-09-07T09:13:22.3361506Z #34 8347.6 adding 'vllm/model_executor/models/phimoe.py' 2025-09-07T09:13:22.3372395Z #34 8347.6 adding 'vllm/model_executor/models/pixtral.py' 2025-09-07T09:13:22.3372825Z #34 8347.6 adding 'vllm/model_executor/models/plamo2.py' 2025-09-07T09:13:22.3372999Z #34 8347.6 adding 'vllm/model_executor/models/qwen.py' 2025-09-07T09:13:22.3373245Z #34 8347.6 adding 'vllm/model_executor/models/qwen2.py' 2025-09-07T09:13:22.3373488Z #34 8347.6 adding 'vllm/model_executor/models/qwen2_5_omni_thinker.py' 2025-09-07T09:13:22.3373674Z #34 8347.6 adding 'vllm/model_executor/models/qwen2_5_vl.py' 2025-09-07T09:13:22.3373856Z #34 8347.6 adding 'vllm/model_executor/models/qwen2_audio.py' 2025-09-07T09:13:22.3374120Z #34 8347.6 adding 'vllm/model_executor/models/qwen2_moe.py' 2025-09-07T09:13:22.3374302Z #34 8347.6 adding 'vllm/model_executor/models/qwen2_rm.py' 2025-09-07T09:13:22.3374473Z #34 8347.6 adding 'vllm/model_executor/models/qwen2_vl.py' 2025-09-07T09:13:22.3374633Z #34 8347.6 adding 'vllm/model_executor/models/qwen3.py' 2025-09-07T09:13:22.3374815Z #34 8347.6 adding 'vllm/model_executor/models/qwen3_moe.py' 2025-09-07T09:13:22.3374985Z #34 8347.6 adding 'vllm/model_executor/models/qwen_vl.py' 2025-09-07T09:13:22.3375161Z #34 8347.6 adding 'vllm/model_executor/models/registry.py' 2025-09-07T09:13:22.3375341Z #34 8347.6 adding 'vllm/model_executor/models/roberta.py' 2025-09-07T09:13:22.3375497Z #34 8347.6 adding 'vllm/model_executor/models/rvl.py' 2025-09-07T09:13:22.3375705Z #34 8347.6 adding 'vllm/model_executor/models/seed_oss.py' 2025-09-07T09:13:22.3375869Z #34 8347.6 adding 'vllm/model_executor/models/siglip.py' 2025-09-07T09:13:22.3376066Z #34 8347.6 adding 'vllm/model_executor/models/siglip2navit.py' 2025-09-07T09:13:22.3376250Z #34 8347.6 adding 'vllm/model_executor/models/skyworkr1v.py' 2025-09-07T09:13:22.3376418Z #34 8347.6 adding 'vllm/model_executor/models/smolvlm.py' 2025-09-07T09:13:22.3376588Z #34 8347.6 adding 'vllm/model_executor/models/solar.py' 2025-09-07T09:13:22.3376759Z #34 8347.6 adding 'vllm/model_executor/models/stablelm.py' 2025-09-07T09:13:22.3376973Z #34 8347.6 adding 'vllm/model_executor/models/starcoder2.py' 2025-09-07T09:13:22.3377162Z #34 8347.6 adding 'vllm/model_executor/models/step3_text.py' 2025-09-07T09:13:22.3377335Z #34 8347.6 adding 'vllm/model_executor/models/step3_vl.py' 2025-09-07T09:13:22.3377489Z #34 8347.6 adding 'vllm/model_executor/models/swin.py' 2025-09-07T09:13:22.3377660Z #34 8347.6 adding 'vllm/model_executor/models/tarsier.py' 2025-09-07T09:13:22.3377849Z #34 8347.6 adding 'vllm/model_executor/models/telechat2.py' 2025-09-07T09:13:22.3378017Z #34 8347.6 adding 'vllm/model_executor/models/teleflm.py' 2025-09-07T09:13:22.4326464Z #34 8347.6 adding 'vllm/model_executor/models/terratorch.py' 2025-09-07T09:13:22.4326690Z #34 8347.6 adding 'vllm/model_executor/models/transformers.py' 2025-09-07T09:13:22.4327254Z #34 8347.6 adding 'vllm/model_executor/models/ultravox.py' 2025-09-07T09:13:22.4327713Z #34 8347.6 adding 'vllm/model_executor/models/utils.py' 2025-09-07T09:13:22.4328164Z #34 8347.6 adding 'vllm/model_executor/models/vision.py' 2025-09-07T09:13:22.4328754Z #34 8347.6 adding 'vllm/model_executor/models/voxtral.py' 2025-09-07T09:13:22.4329225Z #34 8347.6 adding 'vllm/model_executor/models/whisper.py' 2025-09-07T09:13:22.4329681Z #34 8347.6 adding 'vllm/model_executor/models/zamba2.py' 2025-09-07T09:13:22.4330128Z #34 8347.6 adding 'vllm/model_executor/warmup/__init__.py' 2025-09-07T09:13:22.4330627Z #34 8347.6 adding 'vllm/model_executor/warmup/deep_gemm_warmup.py' 2025-09-07T09:13:22.4331148Z #34 8347.6 adding 'vllm/model_executor/warmup/kernel_warmup.py' 2025-09-07T09:13:22.4331608Z #34 8347.6 adding 'vllm/multimodal/__init__.py' 2025-09-07T09:13:22.4332003Z #34 8347.6 adding 'vllm/multimodal/audio.py' 2025-09-07T09:13:22.4332671Z #34 8347.6 adding 'vllm/multimodal/base.py' 2025-09-07T09:13:22.4333056Z #34 8347.6 adding 'vllm/multimodal/cache.py' 2025-09-07T09:13:22.4333440Z #34 8347.6 adding 'vllm/multimodal/hasher.py' 2025-09-07T09:13:22.4333841Z #34 8347.6 adding 'vllm/multimodal/image.py' 2025-09-07T09:13:22.4334221Z #34 8347.6 adding 'vllm/multimodal/inputs.py' 2025-09-07T09:13:22.4334657Z #34 8347.6 adding 'vllm/multimodal/parse.py' 2025-09-07T09:13:22.4335085Z #34 8347.6 adding 'vllm/multimodal/processing.py' 2025-09-07T09:13:22.4335542Z #34 8347.6 adding 'vllm/multimodal/profiling.py' 2025-09-07T09:13:22.4335945Z #34 8347.6 adding 'vllm/multimodal/registry.py' 2025-09-07T09:13:22.4336343Z #34 8347.6 adding 'vllm/multimodal/utils.py' 2025-09-07T09:13:22.4336718Z #34 8347.6 adding 'vllm/multimodal/video.py' 2025-09-07T09:13:22.4337104Z #34 8347.6 adding 'vllm/platforms/__init__.py' 2025-09-07T09:13:22.4337491Z #34 8347.6 adding 'vllm/platforms/cpu.py' 2025-09-07T09:13:22.4337940Z #34 8347.6 adding 'vllm/platforms/cuda.py' 2025-09-07T09:13:22.4338330Z #34 8347.6 adding 'vllm/platforms/interface.py' 2025-09-07T09:13:22.4338714Z #34 8347.6 adding 'vllm/platforms/rocm.py' 2025-09-07T09:13:22.4339080Z #34 8347.6 adding 'vllm/platforms/tpu.py' 2025-09-07T09:13:22.4339441Z #34 8347.6 adding 'vllm/platforms/xpu.py' 2025-09-07T09:13:22.4339926Z #34 8347.6 adding 'vllm/plugins/__init__.py' 2025-09-07T09:13:22.4340348Z #34 8347.6 adding 'vllm/plugins/io_processors/__init__.py' 2025-09-07T09:13:22.4340837Z #34 8347.6 adding 'vllm/plugins/io_processors/interface.py' 2025-09-07T09:13:22.4341315Z #34 8347.6 adding 'vllm/plugins/lora_resolvers/README.md' 2025-09-07T09:13:22.4341781Z #34 8347.6 adding 'vllm/plugins/lora_resolvers/__init__.py' 2025-09-07T09:13:22.4342388Z #34 8347.6 adding 'vllm/plugins/lora_resolvers/filesystem_resolver.py' 2025-09-07T09:13:22.4342878Z #34 8347.6 adding 'vllm/profiler/__init__.py' 2025-09-07T09:13:22.4343309Z #34 8347.6 adding 'vllm/profiler/layerwise_profile.py' 2025-09-07T09:13:22.4343715Z #34 8347.6 adding 'vllm/profiler/utils.py' 2025-09-07T09:13:22.4344197Z #34 8347.6 adding 'vllm/ray/__init__.py' 2025-09-07T09:13:22.4344540Z #34 8347.6 adding 'vllm/ray/lazy_utils.py' 2025-09-07T09:13:22.4344892Z #34 8347.6 adding 'vllm/ray/ray_env.py' 2025-09-07T09:13:22.4345250Z #34 8347.6 adding 'vllm/reasoning/__init__.py' 2025-09-07T09:13:22.4345721Z #34 8347.6 adding 'vllm/reasoning/abs_reasoning_parsers.py' 2025-09-07T09:13:22.4346237Z #34 8347.6 adding 'vllm/reasoning/deepseek_r1_reasoning_parser.py' 2025-09-07T09:13:22.4346751Z #34 8347.6 adding 'vllm/reasoning/glm4_moe_reasoning_parser.py' 2025-09-07T09:13:22.4347253Z #34 8347.6 adding 'vllm/reasoning/gptoss_reasoning_parser.py' 2025-09-07T09:13:22.4347740Z #34 8347.6 adding 'vllm/reasoning/granite_reasoning_parser.py' 2025-09-07T09:13:22.4348260Z #34 8347.6 adding 'vllm/reasoning/hunyuan_a13b_reasoning_parser.py' 2025-09-07T09:13:22.4348777Z #34 8347.6 adding 'vllm/reasoning/mistral_reasoning_parser.py' 2025-09-07T09:13:22.4349266Z #34 8347.6 adding 'vllm/reasoning/qwen3_reasoning_parser.py' 2025-09-07T09:13:22.4349749Z #34 8347.6 adding 'vllm/reasoning/step3_reasoning_parser.py' 2025-09-07T09:13:22.4350182Z #34 8347.6 adding 'vllm/third_party/__init__.py' 2025-09-07T09:13:22.4350579Z #34 8347.7 adding 'vllm/third_party/pynvml.py' 2025-09-07T09:13:22.4351012Z #34 8347.7 adding 'vllm/transformers_utils/__init__.py' 2025-09-07T09:13:22.4351452Z #34 8347.7 adding 'vllm/transformers_utils/config.py' 2025-09-07T09:13:22.4351891Z #34 8347.7 adding 'vllm/transformers_utils/detokenizer.py' 2025-09-07T09:13:22.4352384Z #34 8347.7 adding 'vllm/transformers_utils/detokenizer_utils.py' 2025-09-07T09:13:22.4352886Z #34 8347.7 adding 'vllm/transformers_utils/dynamic_module.py' 2025-09-07T09:13:22.4353354Z #34 8347.7 adding 'vllm/transformers_utils/processor.py' 2025-09-07T09:13:22.4353799Z #34 8347.7 adding 'vllm/transformers_utils/s3_utils.py' 2025-09-07T09:13:22.4354241Z #34 8347.7 adding 'vllm/transformers_utils/tokenizer.py' 2025-09-07T09:13:22.4354709Z #34 8347.7 adding 'vllm/transformers_utils/tokenizer_base.py' 2025-09-07T09:13:22.4355193Z #34 8347.7 adding 'vllm/transformers_utils/tokenizer_group.py' 2025-09-07T09:13:22.4355658Z #34 8347.7 adding 'vllm/transformers_utils/utils.py' 2025-09-07T09:13:22.4356155Z #34 8347.7 adding 'vllm/transformers_utils/chat_templates/__init__.py' 2025-09-07T09:13:22.4356724Z #34 8347.7 adding 'vllm/transformers_utils/chat_templates/registry.py' 2025-09-07T09:13:22.4357337Z #34 8347.7 adding 'vllm/transformers_utils/chat_templates/template_basic.jinja' 2025-09-07T09:13:22.4357990Z #34 8347.7 adding 'vllm/transformers_utils/chat_templates/template_blip2.jinja' 2025-09-07T09:13:22.4358648Z #34 8347.7 adding 'vllm/transformers_utils/chat_templates/template_chatml.jinja' 2025-09-07T09:13:22.4359332Z #34 8347.7 adding 'vllm/transformers_utils/chat_templates/template_deepseek_vl2.jinja' 2025-09-07T09:13:22.4360017Z #34 8347.7 adding 'vllm/transformers_utils/chat_templates/template_fuyu.jinja' 2025-09-07T09:13:22.4360727Z #34 8347.7 adding 'vllm/transformers_utils/chat_templates/template_minicpmv45.jinja' 2025-09-07T09:13:22.4361332Z #34 8347.7 adding 'vllm/transformers_utils/configs/__init__.py' 2025-09-07T09:13:22.4361833Z #34 8347.7 adding 'vllm/transformers_utils/configs/arctic.py' 2025-09-07T09:13:22.4362317Z #34 8347.7 adding 'vllm/transformers_utils/configs/chatglm.py' 2025-09-07T09:13:22.4362839Z #34 8347.7 adding 'vllm/transformers_utils/configs/deepseek_vl2.py' 2025-09-07T09:13:22.4363347Z #34 8347.7 adding 'vllm/transformers_utils/configs/eagle.py' 2025-09-07T09:13:22.4363832Z #34 8347.7 adding 'vllm/transformers_utils/configs/falcon.py' 2025-09-07T09:13:22.4364314Z #34 8347.7 adding 'vllm/transformers_utils/configs/jais.py' 2025-09-07T09:13:22.4364824Z #34 8347.7 adding 'vllm/transformers_utils/configs/kimi_vl.py' 2025-09-07T09:13:22.4365311Z #34 8347.7 adding 'vllm/transformers_utils/configs/medusa.py' 2025-09-07T09:13:22.4366105Z #34 8347.7 adding 'vllm/transformers_utils/configs/midashenglm.py' 2025-09-07T09:13:22.4366656Z #34 8347.7 adding 'vllm/transformers_utils/configs/mistral.py' 2025-09-07T09:13:22.4367179Z #34 8347.7 adding 'vllm/transformers_utils/configs/mlp_speculator.py' 2025-09-07T09:13:22.4367720Z #34 8347.7 adding 'vllm/transformers_utils/configs/moonvit.py' 2025-09-07T09:13:22.4368221Z #34 8347.7 adding 'vllm/transformers_utils/configs/nemotron.py' 2025-09-07T09:13:22.4369098Z #34 8347.7 adding 'vllm/transformers_utils/configs/nemotron_h.py' 2025-09-07T09:13:22.4369635Z #34 8347.7 adding 'vllm/transformers_utils/configs/nemotron_vl.py' 2025-09-07T09:13:22.4370221Z #34 8347.7 adding 'vllm/transformers_utils/configs/ovis.py' 2025-09-07T09:13:22.4370719Z #34 8347.7 adding 'vllm/transformers_utils/configs/step3_vl.py' 2025-09-07T09:13:22.4371223Z #34 8347.7 adding 'vllm/transformers_utils/configs/ultravox.py' 2025-09-07T09:13:22.4371786Z #34 8347.7 adding 'vllm/transformers_utils/configs/speculators/__init__.py' 2025-09-07T09:13:22.4372478Z #34 8347.7 adding 'vllm/transformers_utils/configs/speculators/algos.py' 2025-09-07T09:13:22.4373303Z #34 8347.7 adding 'vllm/transformers_utils/configs/speculators/base.py' 2025-09-07T09:13:22.4373885Z #34 8347.7 adding 'vllm/transformers_utils/processors/__init__.py' 2025-09-07T09:13:22.4374441Z #34 8347.7 adding 'vllm/transformers_utils/processors/deepseek_vl2.py' 2025-09-07T09:13:22.4375046Z #34 8347.7 adding 'vllm/transformers_utils/processors/ovis.py' 2025-09-07T09:13:22.4375570Z #34 8347.7 adding 'vllm/transformers_utils/processors/ovis2_5.py' 2025-09-07T09:13:22.4376120Z #34 8347.7 adding 'vllm/transformers_utils/tokenizers/__init__.py' 2025-09-07T09:13:22.4376671Z #34 8347.7 adding 'vllm/transformers_utils/tokenizers/mistral.py' 2025-09-07T09:13:22.4377144Z #34 8347.7 adding 'vllm/triton_utils/__init__.py' 2025-09-07T09:13:22.4377580Z #34 8347.7 adding 'vllm/triton_utils/importing.py' 2025-09-07T09:13:22.4377981Z #34 8347.7 adding 'vllm/usage/__init__.py' 2025-09-07T09:13:22.4378357Z #34 8347.7 adding 'vllm/usage/usage_lib.py' 2025-09-07T09:13:22.4378717Z #34 8347.7 adding 'vllm/utils/__init__.py' 2025-09-07T09:13:22.4379092Z #34 8347.7 adding 'vllm/utils/deep_gemm.py' 2025-09-07T09:13:22.4379475Z #34 8347.7 adding 'vllm/utils/flashinfer.py' 2025-09-07T09:13:22.4379844Z #34 8347.7 adding 'vllm/utils/jsontree.py' 2025-09-07T09:13:22.4380235Z #34 8347.7 adding 'vllm/utils/tensor_schema.py' 2025-09-07T09:13:22.4380610Z #34 8347.7 adding 'vllm/v1/__init__.py' 2025-09-07T09:13:22.4380998Z #34 8347.7 adding 'vllm/v1/cudagraph_dispatcher.py' 2025-09-07T09:13:22.4381416Z #34 8347.7 adding 'vllm/v1/kv_cache_interface.py' 2025-09-07T09:13:22.4381808Z #34 8347.7 adding 'vllm/v1/outputs.py' 2025-09-07T09:13:22.4382145Z #34 8347.7 adding 'vllm/v1/request.py' 2025-09-07T09:13:22.4382504Z #34 8347.7 adding 'vllm/v1/serial_utils.py' 2025-09-07T09:13:22.4382854Z #34 8347.7 adding 'vllm/v1/utils.py' 2025-09-07T09:13:22.4383222Z #34 8347.7 adding 'vllm/v1/attention/__init__.py' 2025-09-07T09:13:22.4383691Z #34 8347.7 adding 'vllm/v1/attention/backends/__init__.py' 2025-09-07T09:13:22.4384154Z #34 8347.7 adding 'vllm/v1/attention/backends/cpu_attn.py' 2025-09-07T09:13:22.4384741Z #34 8347.7 adding 'vllm/v1/attention/backends/flash_attn.py' 2025-09-07T09:13:22.4385214Z #34 8347.7 adding 'vllm/v1/attention/backends/flashinfer.py' 2025-09-07T09:13:22.4385701Z #34 8347.7 adding 'vllm/v1/attention/backends/flex_attention.py' 2025-09-07T09:13:22.4386197Z #34 8347.7 adding 'vllm/v1/attention/backends/linear_attn.py' 2025-09-07T09:13:22.4386666Z #34 8347.7 adding 'vllm/v1/attention/backends/mamba1_attn.py' 2025-09-07T09:13:22.4387131Z #34 8347.7 adding 'vllm/v1/attention/backends/mamba2_attn.py' 2025-09-07T09:13:22.4387591Z #34 8347.7 adding 'vllm/v1/attention/backends/mamba_attn.py' 2025-09-07T09:13:22.4388056Z #34 8347.7 adding 'vllm/v1/attention/backends/pallas.py' 2025-09-07T09:13:22.4388563Z #34 8347.7 adding 'vllm/v1/attention/backends/rocm_aiter_fa.py' 2025-09-07T09:13:22.4389072Z #34 8347.7 adding 'vllm/v1/attention/backends/short_conv_attn.py' 2025-09-07T09:13:22.4389552Z #34 8347.7 adding 'vllm/v1/attention/backends/tree_attn.py' 2025-09-07T09:13:22.4390015Z #34 8347.7 adding 'vllm/v1/attention/backends/triton_attn.py' 2025-09-07T09:13:22.4390466Z #34 8347.7 adding 'vllm/v1/attention/backends/utils.py' 2025-09-07T09:13:22.4390899Z #34 8347.7 adding 'vllm/v1/attention/backends/xformers.py' 2025-09-07T09:13:22.4391404Z #34 8347.7 adding 'vllm/v1/attention/backends/mla/__init__.py' 2025-09-07T09:13:22.4392046Z #34 8347.7 adding 'vllm/v1/attention/backends/mla/common.py' 2025-09-07T09:13:22.4392737Z #34 8347.7 adding 'vllm/v1/attention/backends/mla/cutlass_mla.py' 2025-09-07T09:13:22.4393272Z #34 8347.7 adding 'vllm/v1/attention/backends/mla/flashattn_mla.py' 2025-09-07T09:13:22.4393801Z #34 8347.7 adding 'vllm/v1/attention/backends/mla/flashmla.py' 2025-09-07T09:13:22.4394331Z #34 8347.7 adding 'vllm/v1/attention/backends/mla/rocm_aiter_mla.py' 2025-09-07T09:13:22.4394875Z #34 8347.7 adding 'vllm/v1/attention/backends/mla/triton_mla.py' 2025-09-07T09:13:22.4395332Z #34 8347.7 adding 'vllm/v1/core/__init__.py' 2025-09-07T09:13:22.4395716Z #34 8347.7 adding 'vllm/v1/core/block_pool.py' 2025-09-07T09:13:22.4396140Z #34 8347.7 adding 'vllm/v1/core/encoder_cache_manager.py' 2025-09-07T09:13:22.4396603Z #34 8347.7 adding 'vllm/v1/core/kv_cache_coordinator.py' 2025-09-07T09:13:22.4397037Z #34 8347.7 adding 'vllm/v1/core/kv_cache_manager.py' 2025-09-07T09:13:22.4397546Z #34 8347.7 adding 'vllm/v1/core/kv_cache_utils.py' 2025-09-07T09:13:22.4398010Z #34 8347.7 adding 'vllm/v1/core/single_type_kv_cache_manager.py' 2025-09-07T09:13:22.4398490Z #34 8347.7 adding 'vllm/v1/core/sched/__init__.py' 2025-09-07T09:13:22.4398921Z #34 8347.7 adding 'vllm/v1/core/sched/async_scheduler.py' 2025-09-07T09:13:22.4399461Z #34 8347.7 adding 'vllm/v1/core/sched/interface.py' 2025-09-07T09:13:22.4399878Z #34 8347.7 adding 'vllm/v1/core/sched/output.py' 2025-09-07T09:13:22.4400311Z #34 8347.7 adding 'vllm/v1/core/sched/request_queue.py' 2025-09-07T09:13:22.4400731Z #34 8347.7 adding 'vllm/v1/core/sched/scheduler.py' 2025-09-07T09:13:22.4401139Z #34 8347.7 adding 'vllm/v1/core/sched/utils.py' 2025-09-07T09:13:22.4401520Z #34 8347.7 adding 'vllm/v1/engine/__init__.py' 2025-09-07T09:13:22.4401910Z #34 8347.7 adding 'vllm/v1/engine/async_llm.py' 2025-09-07T09:13:22.4402312Z #34 8347.7 adding 'vllm/v1/engine/coordinator.py' 2025-09-07T09:13:22.4402693Z #34 8347.7 adding 'vllm/v1/engine/core.py' 2025-09-07T09:13:22.4403076Z #34 8347.7 adding 'vllm/v1/engine/core_client.py' 2025-09-07T09:13:22.4403473Z #34 8347.7 adding 'vllm/v1/engine/detokenizer.py' 2025-09-07T09:13:22.4403874Z #34 8347.7 adding 'vllm/v1/engine/exceptions.py' 2025-09-07T09:13:22.4404265Z #34 8347.7 adding 'vllm/v1/engine/llm_engine.py' 2025-09-07T09:13:22.4404659Z #34 8347.7 adding 'vllm/v1/engine/logprobs.py' 2025-09-07T09:13:22.4405185Z #34 8347.7 adding 'vllm/v1/engine/output_processor.py' 2025-09-07T09:13:22.4405614Z #34 8347.7 adding 'vllm/v1/engine/parallel_sampling.py' 2025-09-07T09:13:22.4406083Z #34 8347.7 adding 'vllm/v1/engine/processor.py' 2025-09-07T09:13:22.4406448Z #34 8347.7 adding 'vllm/v1/engine/utils.py' 2025-09-07T09:13:22.4406818Z #34 8347.7 adding 'vllm/v1/executor/__init__.py' 2025-09-07T09:13:22.4407192Z #34 8347.7 adding 'vllm/v1/executor/abstract.py' 2025-09-07T09:13:22.4407623Z #34 8347.7 adding 'vllm/v1/executor/multiproc_executor.py' 2025-09-07T09:13:22.4408106Z #34 8347.7 adding 'vllm/v1/executor/ray_distributed_executor.py' 2025-09-07T09:13:22.4408552Z #34 8347.7 adding 'vllm/v1/metrics/__init__.py' 2025-09-07T09:13:22.4408940Z #34 8347.7 adding 'vllm/v1/metrics/loggers.py' 2025-09-07T09:13:22.4409317Z #34 8347.7 adding 'vllm/v1/metrics/prometheus.py' 2025-09-07T09:13:22.4409717Z #34 8347.7 adding 'vllm/v1/metrics/ray_wrappers.py' 2025-09-07T09:13:22.4410101Z #34 8347.7 adding 'vllm/v1/metrics/reader.py' 2025-09-07T09:13:22.6123564Z #34 8347.7 adding 'vllm/v1/metrics/stats.py' 2025-09-07T09:13:22.6124045Z #34 8347.7 adding 'vllm/v1/pool/__init__.py' 2025-09-07T09:13:22.6124490Z #34 8347.7 adding 'vllm/v1/pool/metadata.py' 2025-09-07T09:13:22.6124875Z #34 8347.7 adding 'vllm/v1/sample/__init__.py' 2025-09-07T09:13:22.6125254Z #34 8347.7 adding 'vllm/v1/sample/metadata.py' 2025-09-07T09:13:22.6125678Z #34 8347.7 adding 'vllm/v1/sample/rejection_sampler.py' 2025-09-07T09:13:22.6126098Z #34 8347.7 adding 'vllm/v1/sample/sampler.py' 2025-09-07T09:13:22.6126572Z #34 8347.7 adding 'vllm/v1/sample/logits_processor/__init__.py' 2025-09-07T09:13:22.6127348Z #34 8347.7 adding 'vllm/v1/sample/logits_processor/builtin.py' 2025-09-07T09:13:22.6127913Z #34 8347.7 adding 'vllm/v1/sample/logits_processor/interface.py' 2025-09-07T09:13:22.6128424Z #34 8347.7 adding 'vllm/v1/sample/logits_processor/state.py' 2025-09-07T09:13:22.6128864Z #34 8347.7 adding 'vllm/v1/sample/ops/__init__.py' 2025-09-07T09:13:22.6129287Z #34 8347.7 adding 'vllm/v1/sample/ops/bad_words.py' 2025-09-07T09:13:22.6129698Z #34 8347.7 adding 'vllm/v1/sample/ops/logprobs.py' 2025-09-07T09:13:22.6130115Z #34 8347.7 adding 'vllm/v1/sample/ops/penalties.py' 2025-09-07T09:13:22.6130561Z #34 8347.7 adding 'vllm/v1/sample/ops/topk_topp_sampler.py' 2025-09-07T09:13:22.6131006Z #34 8347.7 adding 'vllm/v1/sample/tpu/__init__.py' 2025-09-07T09:13:22.6131416Z #34 8347.7 adding 'vllm/v1/sample/tpu/metadata.py' 2025-09-07T09:13:22.6131814Z #34 8347.7 adding 'vllm/v1/sample/tpu/sampler.py' 2025-09-07T09:13:22.6132465Z #34 8347.7 adding 'vllm/v1/spec_decode/__init__.py' 2025-09-07T09:13:22.6133054Z #34 8347.7 adding 'vllm/v1/spec_decode/eagle.py' 2025-09-07T09:13:22.6133492Z #34 8347.7 adding 'vllm/v1/spec_decode/medusa.py' 2025-09-07T09:13:22.6133900Z #34 8347.7 adding 'vllm/v1/spec_decode/metadata.py' 2025-09-07T09:13:22.6134327Z #34 8347.7 adding 'vllm/v1/spec_decode/metrics.py' 2025-09-07T09:13:22.6134768Z #34 8347.7 adding 'vllm/v1/spec_decode/ngram_proposer.py' 2025-09-07T09:13:22.6135217Z #34 8347.7 adding 'vllm/v1/spec_decode/utils.py' 2025-09-07T09:13:22.6135796Z #34 8347.7 adding 'vllm/v1/structured_output/__init__.py' 2025-09-07T09:13:22.6136300Z #34 8347.7 adding 'vllm/v1/structured_output/backend_guidance.py' 2025-09-07T09:13:22.6136896Z #34 8347.7 adding 'vllm/v1/structured_output/backend_lm_format_enforcer.py' 2025-09-07T09:13:22.6137480Z #34 8347.7 adding 'vllm/v1/structured_output/backend_outlines.py' 2025-09-07T09:13:22.6138016Z #34 8347.7 adding 'vllm/v1/structured_output/backend_types.py' 2025-09-07T09:13:22.6138540Z #34 8347.7 adding 'vllm/v1/structured_output/backend_xgrammar.py' 2025-09-07T09:13:22.6139053Z #34 8347.7 adding 'vllm/v1/structured_output/request.py' 2025-09-07T09:13:22.6139511Z #34 8347.7 adding 'vllm/v1/structured_output/utils.py' 2025-09-07T09:13:22.6139924Z #34 8347.7 adding 'vllm/v1/worker/__init__.py' 2025-09-07T09:13:22.6140331Z #34 8347.7 adding 'vllm/v1/worker/block_table.py' 2025-09-07T09:13:22.6140756Z #34 8347.7 adding 'vllm/v1/worker/cpu_model_runner.py' 2025-09-07T09:13:22.6141184Z #34 8347.7 adding 'vllm/v1/worker/cpu_worker.py' 2025-09-07T09:13:22.6141597Z #34 8347.7 adding 'vllm/v1/worker/gpu_input_batch.py' 2025-09-07T09:13:22.6142127Z #34 8347.7 adding 'vllm/v1/worker/gpu_model_runner.py' 2025-09-07T09:13:22.6142542Z #34 8347.7 adding 'vllm/v1/worker/gpu_worker.py' 2025-09-07T09:13:22.6143039Z #34 8347.7 adding 'vllm/v1/worker/kv_connector_model_runner_mixin.py' 2025-09-07T09:13:22.6143585Z #34 8347.7 adding 'vllm/v1/worker/lora_model_runner_mixin.py' 2025-09-07T09:13:22.6144216Z #34 8347.7 adding 'vllm/v1/worker/tpu_input_batch.py' 2025-09-07T09:13:22.6144653Z #34 8347.7 adding 'vllm/v1/worker/tpu_model_runner.py' 2025-09-07T09:13:22.6145059Z #34 8347.7 adding 'vllm/v1/worker/tpu_worker.py' 2025-09-07T09:13:22.6145446Z #34 8347.7 adding 'vllm/v1/worker/utils.py' 2025-09-07T09:13:22.6145815Z #34 8347.7 adding 'vllm/v1/worker/worker_base.py' 2025-09-07T09:13:22.6146229Z #34 8347.7 adding 'vllm/v1/worker/xpu_model_runner.py' 2025-09-07T09:13:22.6146700Z #34 8347.7 adding 'vllm/v1/worker/xpu_worker.py' 2025-09-07T09:13:22.6147085Z #34 8347.7 adding 'vllm/vllm_flash_attn/.gitkeep' 2025-09-07T09:13:22.6147503Z #34 8347.7 adding 'vllm/vllm_flash_attn/__init__.py' 2025-09-07T09:13:33.7613890Z #34 8359.0 adding 'vllm/vllm_flash_attn/_vllm_fa2_C.abi3.so' 2025-09-07T09:14:07.6238509Z #34 8392.9 adding 'vllm/vllm_flash_attn/_vllm_fa3_C.abi3.so' 2025-09-07T09:14:09.1214231Z #34 8394.4 adding 'vllm/vllm_flash_attn/flash_attn_interface.py' 2025-09-07T09:14:09.2954820Z #34 8394.4 adding 'vllm/vllm_flash_attn/layers/__init__.py' 2025-09-07T09:14:09.2955597Z #34 8394.4 adding 'vllm/vllm_flash_attn/layers/rotary.py' 2025-09-07T09:14:09.2956142Z #34 8394.4 adding 'vllm/vllm_flash_attn/ops/triton/__init__.py' 2025-09-07T09:14:09.2956632Z #34 8394.4 adding 'vllm/vllm_flash_attn/ops/triton/rotary.py' 2025-09-07T09:14:09.2957072Z #34 8394.4 adding 'vllm/worker/__init__.py' 2025-09-07T09:14:09.2957454Z #34 8394.4 adding 'vllm/worker/cache_engine.py' 2025-09-07T09:14:09.2957875Z #34 8394.4 adding 'vllm/worker/enc_dec_model_runner.py' 2025-09-07T09:14:09.2958298Z #34 8394.4 adding 'vllm/worker/model_runner.py' 2025-09-07T09:14:09.2958712Z #34 8394.4 adding 'vllm/worker/model_runner_base.py' 2025-09-07T09:14:09.2959109Z #34 8394.4 adding 'vllm/worker/utils.py' 2025-09-07T09:14:09.2959451Z #34 8394.4 adding 'vllm/worker/worker.py' 2025-09-07T09:14:09.2959821Z #34 8394.4 adding 'vllm/worker/worker_base.py' 2025-09-07T09:14:09.2960371Z #34 8394.4 adding 'vllm-0.10.2rc2.dev125+g4172235ab.d20250907.dist-info/licenses/LICENSE' 2025-09-07T09:14:09.2961130Z #34 8394.4 adding 'vllm-0.10.2rc2.dev125+g4172235ab.d20250907.dist-info/METADATA' 2025-09-07T09:14:09.2961763Z #34 8394.4 adding 'vllm-0.10.2rc2.dev125+g4172235ab.d20250907.dist-info/WHEEL' 2025-09-07T09:14:09.2962414Z #34 8394.4 adding 'vllm-0.10.2rc2.dev125+g4172235ab.d20250907.dist-info/entry_points.txt' 2025-09-07T09:14:09.2963115Z #34 8394.4 adding 'vllm-0.10.2rc2.dev125+g4172235ab.d20250907.dist-info/top_level.txt' 2025-09-07T09:14:09.2963760Z #34 8394.4 adding 'vllm-0.10.2rc2.dev125+g4172235ab.d20250907.dist-info/RECORD' 2025-09-07T09:14:09.2964284Z #34 8394.4 removing build/bdist.linux-x86_64/wheel 2025-09-07T09:14:09.5980565Z #34 8394.9 Compile requests 504 2025-09-07T09:14:09.5981070Z #34 8394.9 Compile requests executed 504 2025-09-07T09:14:09.5981627Z #34 8394.9 Cache hits 71 2025-09-07T09:14:09.5982160Z #34 8394.9 Cache hits (C/C++) 6 2025-09-07T09:14:09.5982707Z #34 8394.9 Cache hits (CUDA) 65 2025-09-07T09:14:09.5986620Z #34 8394.9 Cache misses 433 2025-09-07T09:14:09.5987132Z #34 8394.9 Cache misses (C/C++) 4 2025-09-07T09:14:09.5987677Z #34 8394.9 Cache misses (CUDA) 429 2025-09-07T09:14:09.5988190Z #34 8394.9 Cache timeouts 0 2025-09-07T09:14:09.5988732Z #34 8394.9 Cache read errors 0 2025-09-07T09:14:09.5989236Z #34 8394.9 Forced recaches 0 2025-09-07T09:14:09.5989698Z #34 8394.9 Cache write errors 0 2025-09-07T09:14:09.5990338Z #34 8394.9 Compilation failures 0 2025-09-07T09:14:09.5990754Z #34 8394.9 Cache errors 0 2025-09-07T09:14:09.5991177Z #34 8394.9 Non-cacheable compilations 0 2025-09-07T09:14:09.5991592Z #34 8394.9 Non-cacheable calls 0 2025-09-07T09:14:09.5992261Z #34 8394.9 Non-compilation calls 0 2025-09-07T09:14:09.5992695Z #34 8394.9 Unsupported compiler calls 0 2025-09-07T09:14:09.5993133Z #34 8394.9 Average cache write 0.088 s 2025-09-07T09:14:09.5993547Z #34 8394.9 Average compiler 182.818 s 2025-09-07T09:14:09.5993968Z #34 8394.9 Average cache read hit 0.097 s 2025-09-07T09:14:09.5994399Z #34 8394.9 Failed distributed compilations 0 2025-09-07T09:14:09.5995057Z #34 8394.9 Cache location s3, name: ossci-compiler-cache-circleci-v2, prefix: / 2025-09-07T09:14:09.5995617Z #34 8394.9 Version (client) 0.8.1 2025-09-07T09:14:09.9762687Z #34 DONE 8395.3s 2025-09-07T09:14:10.1281855Z 2025-09-07T09:14:10.1286156Z #35 [build 6/7] RUN --mount=type=cache,target=/root/.cache/ccache --mount=type=cache,target=/root/.cache/uv --mount=type=bind,source=.git,target=.git if [ "1" != "1" ]; then rm -rf .deps && mkdir -p .deps && export VLLM_DOCKER_BUILD_CONTEXT=1 && python3 setup.py bdist_wheel --dist-dir=vllm-dist --py-limited-api=cp38; fi 2025-09-07T09:14:10.4608184Z #35 DONE 0.5s 2025-09-07T09:14:10.6123906Z 2025-09-07T09:14:10.6124863Z #36 [build 7/7] RUN echo "[INFO] Listing current directory:" && ls -al && echo "[INFO] Showing torch_build_versions.txt content:" && cat torch_build_versions.txt 2025-09-07T09:14:11.6693862Z #36 1.207 [INFO] Listing current directory: 2025-09-07T09:14:11.8420235Z #36 1.213 total 356 2025-09-07T09:14:11.8420660Z #36 1.213 drwxr-xr-x. 1 root root 94 Sep 7 09:12 . 2025-09-07T09:14:11.8421168Z #36 1.213 drwxr-xr-x. 1 root root 6 Sep 7 09:14 .. 2025-09-07T09:14:11.8421668Z #36 1.213 drwxr-xr-x. 10 root root 16384 Sep 7 06:20 benchmarks 2025-09-07T09:14:11.8422161Z #36 1.213 drwxr-xr-x. 5 root root 105 Sep 7 09:12 build 2025-09-07T09:14:11.8422643Z #36 1.213 drwxr-xr-x. 5 root root 16384 Sep 7 06:20 .buildkite 2025-09-07T09:14:11.8423159Z #36 1.213 -rw-r--r--. 1 root root 641 Sep 7 06:20 .clang-format 2025-09-07T09:14:11.8423918Z #36 1.213 drwxr-xr-x. 3 root root 94 Sep 7 06:20 cmake 2025-09-07T09:14:11.8424430Z #36 1.213 -rw-r--r--. 1 root root 38227 Sep 7 06:20 CMakeLists.txt 2025-09-07T09:14:11.8425051Z #36 1.213 -rw-r--r--. 1 root root 5318 Sep 7 06:20 CODE_OF_CONDUCT.md 2025-09-07T09:14:11.8425552Z #36 1.213 -rw-r--r--. 1 root root 140 Sep 7 06:20 CONTRIBUTING.md 2025-09-07T09:14:11.8426035Z #36 1.213 drwxr-xr-x. 1 root root 63 Sep 7 06:20 csrc 2025-09-07T09:14:11.8426476Z #36 1.213 -rw-r--r--. 1 root root 1366 Sep 7 06:20 DCO 2025-09-07T09:14:11.8426924Z #36 1.213 drwxr-xr-x. 10 root root 16384 Sep 7 06:54 .deps 2025-09-07T09:14:11.8427390Z #36 1.213 drwxr-xr-x. 2 root root 16384 Sep 7 06:20 docker 2025-09-07T09:14:11.8427852Z #36 1.213 -rw-r--r--. 1 root root 345 Sep 7 06:20 .dockerignore 2025-09-07T09:14:11.8428322Z #36 1.213 drwxr-xr-x. 18 root root 16384 Sep 7 06:20 docs 2025-09-07T09:14:11.8428773Z #36 1.213 drwxr-xr-x. 5 root root 16384 Sep 7 06:20 examples 2025-09-07T09:14:11.8429279Z #36 1.213 -rw-r--r--. 1 root root 944 Sep 7 06:20 find_cuda_init.py 2025-09-07T09:14:11.8429765Z #36 1.213 -rwxr-xr-x. 1 root root 284 Sep 7 06:20 format.sh 2025-09-07T09:14:11.8430232Z #36 1.213 drwxr-xr-x. 2 root root 25 Sep 7 06:20 .gemini 2025-09-07T09:14:11.8430700Z #36 1.213 drwxr-xr-x. 8 root root 181 Sep 7 06:20 .git 2025-09-07T09:14:11.8431146Z #36 1.213 drwxr-xr-x. 5 root root 16384 Sep 7 06:20 .github 2025-09-07T09:14:11.8431613Z #36 1.213 -rw-r--r--. 1 root root 3734 Sep 7 06:20 .gitignore 2025-09-07T09:14:11.8432061Z #36 1.213 -rw-r--r--. 1 root root 11357 Sep 7 06:20 LICENSE 2025-09-07T09:14:11.8432775Z #36 1.213 -rw-r--r--. 1 root root 212 Sep 7 06:20 MANIFEST.in 2025-09-07T09:14:11.8433248Z #36 1.213 -rw-r--r--. 1 root root 165 Sep 7 06:20 .markdownlint.yaml 2025-09-07T09:14:11.8433734Z #36 1.213 -rw-r--r--. 1 root root 4237 Sep 7 06:20 mkdocs.yaml 2025-09-07T09:14:11.8434236Z #36 1.213 -rw-r--r--. 1 root root 6134 Sep 7 06:20 .pre-commit-config.yaml 2025-09-07T09:14:11.8434746Z #36 1.213 -rw-r--r--. 1 root root 8187 Sep 7 06:54 pyproject.toml 2025-09-07T09:14:11.8435214Z #36 1.213 -rw-r--r--. 1 root root 12531 Sep 7 06:20 README.md 2025-09-07T09:14:11.8435676Z #36 1.213 -rw-r--r--. 1 root root 416 Sep 7 06:20 .readthedocs.yaml 2025-09-07T09:14:11.8436158Z #36 1.213 -rw-r--r--. 1 root root 5696 Sep 7 06:20 RELEASE.md 2025-09-07T09:14:11.8436675Z #36 1.213 drwxr-xr-x. 1 root root 159 Sep 7 06:20 requirements 2025-09-07T09:14:11.8437152Z #36 1.213 -rw-r--r--. 1 root root 3657 Sep 7 06:20 SECURITY.md 2025-09-07T09:14:11.8437609Z #36 1.213 -rw-r--r--. 1 root root 24740 Sep 7 06:20 setup.py 2025-09-07T09:14:11.8438055Z #36 1.213 -rw-r--r--. 1 root root 496 Sep 7 06:20 .shellcheckrc 2025-09-07T09:14:11.8438517Z #36 1.213 drwxr-xr-x. 46 root root 16384 Sep 7 06:20 tests 2025-09-07T09:14:11.8438948Z #36 1.213 drwxr-xr-x. 2 root root 16384 Sep 7 06:20 tmp 2025-09-07T09:14:11.8439391Z #36 1.213 drwxr-xr-x. 4 root root 16384 Sep 7 06:20 tools 2025-09-07T09:14:11.8439935Z #36 1.213 -rw-r--r--. 1 root root 290 Sep 7 06:53 torch_build_versions.txt 2025-09-07T09:14:11.8440475Z #36 1.213 -rw-r--r--. 1 root root 654 Sep 7 06:20 use_existing_torch.py 2025-09-07T09:14:11.8440962Z #36 1.213 drwxr-xr-x. 1 root root 67 Sep 7 06:54 vllm 2025-09-07T09:14:11.8441402Z #36 1.213 drwxr-xr-x. 2 root root 83 Sep 7 09:12 vllm-dist 2025-09-07T09:14:11.8441884Z #36 1.213 drwxr-xr-x. 2 root root 134 Sep 7 06:54 vllm.egg-info 2025-09-07T09:14:11.8442366Z #36 1.213 drwxr-xr-x. 2 root root 75 Sep 7 06:52 xformers-dist 2025-09-07T09:14:11.8442849Z #36 1.213 -rw-r--r--. 1 root root 15 Sep 7 06:20 .yapfignore 2025-09-07T09:14:11.8443292Z #36 1.213 [INFO] Showing torch_build_versions.txt content: 2025-09-07T09:14:11.8443893Z #36 1.215 torch @ file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T09:14:11.8444710Z #36 1.215 torchaudio @ file:///dist/torchaudio-2.8.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T09:14:11.8445621Z #36 1.215 torchvision @ file:///dist/torchvision-0.24.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T09:14:11.8446227Z #36 DONE 1.2s 2025-09-07T09:14:13.3456348Z 2025-09-07T09:14:13.3457142Z #37 [export-wheels 2/4] COPY --from=build /workspace/vllm-dist /wheels/vllm 2025-09-07T09:14:13.5603257Z #37 DONE 0.0s 2025-09-07T09:14:13.5603535Z 2025-09-07T09:14:13.5603825Z #38 [vllm-base 6/18] COPY --from=build /workspace/vllm-dist /wheels/vllm 2025-09-07T09:14:13.5604435Z #38 DONE 0.0s 2025-09-07T09:14:13.5604575Z 2025-09-07T09:14:13.5605294Z #39 [vllm-base 7/18] RUN echo "[INFO] Listing current directory before torch install step:" && ls -al && echo "[INFO] Showing torch_build_versions.txt content:" && cat torch_build_versions.txt 2025-09-07T09:14:13.9005561Z #39 0.491 [INFO] Listing current directory before torch install step: 2025-09-07T09:14:14.0680972Z #39 0.495 total 4 2025-09-07T09:14:14.0681388Z #39 0.495 drwxr-xr-x. 1 root root 38 Sep 7 06:53 . 2025-09-07T09:14:14.0681892Z #39 0.495 drwxr-xr-x. 1 root root 6 Sep 7 09:14 .. 2025-09-07T09:14:14.0682423Z #39 0.495 -rw-r--r--. 1 root root 290 Sep 7 06:53 torch_build_versions.txt 2025-09-07T09:14:14.0682945Z #39 0.495 [INFO] Showing torch_build_versions.txt content: 2025-09-07T09:14:14.0683582Z #39 0.498 torch @ file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T09:14:14.0684432Z #39 0.498 torchaudio @ file:///dist/torchaudio-2.8.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T09:14:14.0685360Z #39 0.498 torchvision @ file:///dist/torchvision-0.24.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T09:14:14.0686255Z #39 DONE 0.5s 2025-09-07T09:14:14.0686404Z 2025-09-07T09:14:14.0686695Z #40 [vllm-base 8/18] RUN ldconfig /usr/local/cuda-$(echo 12.8.1 | cut -d. -f1,2)/compat/ 2025-09-07T09:14:15.0703335Z #40 DONE 1.2s 2025-09-07T09:14:15.2220553Z 2025-09-07T09:14:15.2222597Z #41 [vllm-base 9/18] RUN --mount=type=cache,target=/root/.cache/uv if ! python3 -m uv --version > /dev/null 2>&1; then python3 -m pip install uv==0.8.4; fi 2025-09-07T09:14:16.6776968Z #41 1.607 Collecting uv==0.8.4 2025-09-07T09:14:16.8081174Z #41 1.622 Downloading uv-0.8.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB) 2025-09-07T09:14:16.8082061Z #41 1.637 Downloading uv-0.8.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.8 MB) 2025-09-07T09:14:17.0228357Z #41 1.737 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.8/18.8 MB 201.7 MB/s 0:00:00 2025-09-07T09:14:17.0228948Z #41 1.801 Installing collected packages: uv 2025-09-07T09:14:17.1595785Z #41 2.088 Successfully installed uv-0.8.4 2025-09-07T09:14:17.2838444Z #41 2.089 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-09-07T09:14:17.4352090Z #41 DONE 2.2s 2025-09-07T09:14:17.4352568Z 2025-09-07T09:14:17.4359005Z #42 [vllm-base 10/18] RUN --mount=type=bind,source=tmp,target=/dist --mount=type=cache,target=/root/.cache/uv if [ -n "tmp" ] && [ "tmp" != "./requirements" ] && [ -d "/dist" ] && ls /dist/torch*.whl >/dev/null 2>&1; then torch_whl=$(find /dist -maxdepth 1 -name 'torch-*.whl' -print -quit); vision_whl=$(find /dist -name 'torchvision*.whl' | head -n1 | xargs); audio_whl=$(find /dist -name 'torchaudio*.whl' | head -n1 | xargs); echo "[INFO] Use wheels to build : '${torch_whl}' '${audio_whl}' '${vision_whl}'"; uv pip install --system "${torch_whl}[opt-einsum]" "${vision_whl}" "${audio_whl}" /dist/*.whl; else echo "[INFO] Installing torch versions from torch_build_versions.txt"; uv pip install --system $(cat torch_build_versions.txt | xargs) --index-url https://download.pytorch.org/whl/nightly/cu$(echo 12.8.1 | cut -d. -f1,2 | tr -d '.'); fi 2025-09-07T09:14:18.3281950Z #42 1.043 [INFO] Use wheels to build : '/dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl' '/dist/torchaudio-2.8.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl' '/dist/torchvision-0.24.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl' 2025-09-07T09:14:18.5676204Z #42 1.055 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T09:14:18.5677514Z #42 1.093 Resolved 31 packages in 34ms 2025-09-07T09:14:18.5678454Z #42 1.132 Uninstalled 1 package in 38ms 2025-09-07T09:14:20.1659966Z #42 2.881 Installed 31 packages in 1.74s 2025-09-07T09:14:20.3206973Z #42 2.885 + filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) 2025-09-07T09:14:20.3207629Z #42 2.885 + fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) 2025-09-07T09:14:20.3208218Z #42 2.885 + jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) 2025-09-07T09:14:20.3209018Z #42 2.885 + markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:14:20.3209821Z #42 2.885 + mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) 2025-09-07T09:14:20.3210393Z #42 2.885 + networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) 2025-09-07T09:14:20.3211123Z #42 2.885 + numpy==2.3.2 (from file:///dist/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:14:20.3212041Z #42 2.885 + nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T09:14:20.3213666Z #42 2.885 + nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:14:20.3215716Z #42 2.885 + nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T09:14:20.3216959Z #42 2.885 + nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:14:20.3218065Z #42 2.885 + nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T09:14:20.3219118Z #42 2.885 + nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:14:20.3220340Z #42 2.885 + nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:14:20.3221395Z #42 2.885 + nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T09:14:20.3222405Z #42 2.885 + nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T09:14:20.3223566Z #42 2.885 + nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:14:20.3224793Z #42 2.885 + nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) 2025-09-07T09:14:20.3225908Z #42 2.885 + nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:14:20.3227121Z #42 2.885 + nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T09:14:20.3229181Z #42 2.885 + nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:14:20.3230667Z #42 2.885 + nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:14:20.3231401Z #42 2.885 + opt-einsum==3.4.0 2025-09-07T09:14:20.3231988Z #42 2.885 + pillow==11.3.0 (from file:///dist/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:14:20.3233066Z #42 2.885 + pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:14:20.3233900Z #42 2.885 - setuptools==80.9.0 2025-09-07T09:14:20.3234351Z #42 2.885 + setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) 2025-09-07T09:14:20.3234951Z #42 2.885 + sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) 2025-09-07T09:14:20.3235721Z #42 2.885 + torch==2.9.0.dev20250906+cu128 (from file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T09:14:20.3236763Z #42 2.885 + torchaudio==2.8.0.dev20250906+cu128 (from file:///dist/torchaudio-2.8.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T09:14:20.3237903Z #42 2.886 + torchvision==0.24.0.dev20250906+cu128 (from file:///dist/torchvision-0.24.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T09:14:20.3238850Z #42 2.886 + typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) 2025-09-07T09:15:13.2516566Z #42 DONE 56.0s 2025-09-07T09:15:13.4047528Z 2025-09-07T09:15:13.4048508Z #43 [vllm-base 11/18] RUN --mount=type=cache,target=/root/.cache/uv uv pip install --system /wheels/vllm/*.whl --verbose 2025-09-07T09:15:13.7061400Z #43 0.452 DEBUG uv 0.8.4 2025-09-07T09:15:13.8096536Z #43 0.456 DEBUG Searching for default Python interpreter in managed installations or search path 2025-09-07T09:15:13.8097376Z #43 0.456 DEBUG Searching for managed installations at `/root/.local/share/uv/python` 2025-09-07T09:15:13.8098358Z #43 0.458 DEBUG Found `cpython-3.12.11-linux-x86_64-gnu` at `/opt/python/cp312-cp312/bin/python` (first executable in the search path) 2025-09-07T09:15:13.8099246Z #43 0.458 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T09:15:13.8099788Z #43 0.459 DEBUG Acquired lock for `/opt/python/cp312-cp312` 2025-09-07T09:15:13.8100664Z #43 0.461 DEBUG At least one requirement is not satisfied: file:///wheels/vllm/vllm-0.10.2rc2.dev125+g4172235ab.d20250907-cp38-abi3-linux_x86_64.whl 2025-09-07T09:15:13.8101495Z #43 0.461 DEBUG Using request timeout of 500s 2025-09-07T09:15:13.8102179Z #43 0.473 DEBUG Solving with installed Python version: 3.12.11 2025-09-07T09:15:13.8102679Z #43 0.473 DEBUG Solving with target Python version: >=3.12.11 2025-09-07T09:15:13.8103152Z #43 0.474 DEBUG Adding direct dependency: vllm* 2025-09-07T09:15:13.8103998Z #43 0.474 DEBUG Searching for a compatible version of vllm @ file:///wheels/vllm/vllm-0.10.2rc2.dev125+g4172235ab.d20250907-cp38-abi3-linux_x86_64.whl (*) 2025-09-07T09:15:13.8105381Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: regex* 2025-09-07T09:15:13.8106345Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: cachetools* 2025-09-07T09:15:13.8107192Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: psutil* 2025-09-07T09:15:13.8108062Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: sentencepiece* 2025-09-07T09:15:13.8108905Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: numpy* 2025-09-07T09:15:13.8109887Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: requests>=2.26.0 2025-09-07T09:15:13.8110725Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: tqdm* 2025-09-07T09:15:13.8111515Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: blake3* 2025-09-07T09:15:13.8112344Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: py-cpuinfo* 2025-09-07T09:15:13.8113273Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: transformers>=4.55.2 2025-09-07T09:15:13.8114177Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: tokenizers>=0.21.1 2025-09-07T09:15:13.8115035Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: protobuf* 2025-09-07T09:15:13.8115923Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: fastapi[standard]>=0.115.0 2025-09-07T09:15:13.8116817Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: aiohttp* 2025-09-07T09:15:13.8117645Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: openai>=1.99.1 2025-09-07T09:15:13.8118507Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: pydantic>=2.11.7 2025-09-07T09:15:13.8119439Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: prometheus-client>=0.18.0 2025-09-07T09:15:13.8120314Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: pillow* 2025-09-07T09:15:13.8121292Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: prometheus-fastapi-instrumentator>=7.0.0 2025-09-07T09:15:13.8122282Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: tiktoken>=0.6.0 2025-09-07T09:15:13.8123256Z #43 0.474 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: lm-format-enforcer>=0.11.3, <0.11.3+ 2025-09-07T09:15:13.8124725Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: llguidance{platform_machine == 'aarch64' or platform_machine == 'arm64' or platform_machine == 'x86_64'}>=0.7.11, <0.8.0 2025-09-07T09:15:13.8126222Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: outlines-core{platform_machine != 's390x'}>=0.2.10, <0.2.10+ 2025-09-07T09:15:13.8127336Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: diskcache>=5.6.3, <5.6.3+ 2025-09-07T09:15:13.8128260Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: lark>=1.2.2, <1.2.2+ 2025-09-07T09:15:13.8129625Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: xgrammar{platform_machine == 'aarch64' or platform_machine == 'arm64' or platform_machine == 'x86_64'}>=0.1.23, <0.1.23+ 2025-09-07T09:15:13.8130967Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: typing-extensions>=4.10 2025-09-07T09:15:13.8131867Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: filelock>=3.16.1 2025-09-07T09:15:13.8132893Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: partial-json-parser* 2025-09-07T09:15:13.8133979Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: pyzmq>=25.0.0 2025-09-07T09:15:13.8134829Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: msgspec* 2025-09-07T09:15:13.8135688Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: gguf>=0.13.0 2025-09-07T09:15:13.8136633Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: mistral-common[audio]>=1.8.2 2025-09-07T09:15:13.8137662Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: mistral-common[image]>=1.8.2 2025-09-07T09:15:13.8138710Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: opencv-python-headless>=4.11.0 2025-09-07T09:15:13.8139648Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: pyyaml* 2025-09-07T09:15:13.8140677Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: six{python_full_version >= '3.12'}>=1.16.0 2025-09-07T09:15:13.8141978Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: setuptools{python_full_version >= '3.12'}>=77.0.3, <80 2025-09-07T09:15:13.8143014Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: einops* 2025-09-07T09:15:13.8143994Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: compressed-tensors>=0.11.0, <0.11.0+ 2025-09-07T09:15:13.8145014Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: depyf>=0.19.0, <0.19.0+ 2025-09-07T09:15:13.8146023Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: cloudpickle* 2025-09-07T09:15:13.8146928Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: watchfiles* 2025-09-07T09:15:13.8147795Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: python-json-logger* 2025-09-07T09:15:13.8148653Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: scipy* 2025-09-07T09:15:13.8149439Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: ninja* 2025-09-07T09:15:13.8150255Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: pybase64* 2025-09-07T09:15:13.8151104Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: cbor2* 2025-09-07T09:15:13.8151920Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: setproctitle* 2025-09-07T09:15:13.8152809Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: openai-harmony>=0.0.3 2025-09-07T09:15:13.8154265Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: numba{python_full_version >= '3.10'}>=0.61.2, <0.61.2+ 2025-09-07T09:15:13.8156264Z #43 0.475 DEBUG Adding transitive dependency for vllm==0.10.2rc2.dev125+g4172235ab.d20250907: ray[cgraph]>=2.48.0 2025-09-07T09:15:13.8157656Z #43 0.481 DEBUG Found stale response for: https://pypi.org/simple/cachetools/ 2025-09-07T09:15:13.8158968Z #43 0.481 DEBUG Sending revalidation request for: https://pypi.org/simple/cachetools/ 2025-09-07T09:15:13.8160182Z #43 0.482 DEBUG Found stale response for: https://pypi.org/simple/psutil/ 2025-09-07T09:15:13.8160967Z #43 0.482 DEBUG Sending revalidation request for: https://pypi.org/simple/psutil/ 2025-09-07T09:15:13.8161651Z #43 0.482 DEBUG Found stale response for: https://pypi.org/simple/sentencepiece/ 2025-09-07T09:15:13.8162349Z #43 0.482 DEBUG Sending revalidation request for: https://pypi.org/simple/sentencepiece/ 2025-09-07T09:15:13.8163044Z #43 0.482 DEBUG Found stale response for: https://pypi.org/simple/requests/ 2025-09-07T09:15:13.8163841Z #43 0.482 DEBUG Sending revalidation request for: https://pypi.org/simple/requests/ 2025-09-07T09:15:13.8164496Z #43 0.482 DEBUG Found stale response for: https://pypi.org/simple/tqdm/ 2025-09-07T09:15:13.8165171Z #43 0.482 DEBUG Sending revalidation request for: https://pypi.org/simple/tqdm/ 2025-09-07T09:15:13.8165796Z #43 0.482 DEBUG Found stale response for: https://pypi.org/simple/blake3/ 2025-09-07T09:15:13.8166448Z #43 0.482 DEBUG Sending revalidation request for: https://pypi.org/simple/blake3/ 2025-09-07T09:15:13.8167116Z #43 0.482 DEBUG Found stale response for: https://pypi.org/simple/py-cpuinfo/ 2025-09-07T09:15:13.8167810Z #43 0.482 DEBUG Sending revalidation request for: https://pypi.org/simple/py-cpuinfo/ 2025-09-07T09:15:13.8168499Z #43 0.483 DEBUG Found stale response for: https://pypi.org/simple/transformers/ 2025-09-07T09:15:13.8169252Z #43 0.483 DEBUG Sending revalidation request for: https://pypi.org/simple/transformers/ 2025-09-07T09:15:13.8170002Z #43 0.483 DEBUG Found stale response for: https://pypi.org/simple/fastapi/ 2025-09-07T09:15:13.8170704Z #43 0.483 DEBUG Sending revalidation request for: https://pypi.org/simple/fastapi/ 2025-09-07T09:15:13.8171345Z #43 0.483 DEBUG Found stale response for: https://pypi.org/simple/openai/ 2025-09-07T09:15:13.8172032Z #43 0.483 DEBUG Sending revalidation request for: https://pypi.org/simple/openai/ 2025-09-07T09:15:13.8172678Z #43 0.483 DEBUG Found stale response for: https://pypi.org/simple/pydantic/ 2025-09-07T09:15:13.8173974Z #43 0.483 DEBUG Sending revalidation request for: https://pypi.org/simple/pydantic/ 2025-09-07T09:15:13.8174718Z #43 0.483 DEBUG Found stale response for: https://pypi.org/simple/prometheus-client/ 2025-09-07T09:15:13.8175481Z #43 0.483 DEBUG Sending revalidation request for: https://pypi.org/simple/prometheus-client/ 2025-09-07T09:15:13.8176335Z #43 0.483 DEBUG Found stale response for: https://pypi.org/simple/prometheus-fastapi-instrumentator/ 2025-09-07T09:15:13.8177252Z #43 0.483 DEBUG Sending revalidation request for: https://pypi.org/simple/prometheus-fastapi-instrumentator/ 2025-09-07T09:15:13.8178102Z #43 0.483 DEBUG Found stale response for: https://pypi.org/simple/tiktoken/ 2025-09-07T09:15:13.8178788Z #43 0.483 DEBUG Sending revalidation request for: https://pypi.org/simple/tiktoken/ 2025-09-07T09:15:13.8179508Z #43 0.483 DEBUG Found stale response for: https://pypi.org/simple/lm-format-enforcer/ 2025-09-07T09:15:13.8180331Z #43 0.483 DEBUG Sending revalidation request for: https://pypi.org/simple/lm-format-enforcer/ 2025-09-07T09:15:13.8181124Z #43 0.483 DEBUG Found stale response for: https://pypi.org/simple/llguidance/ 2025-09-07T09:15:13.8182320Z #43 0.483 DEBUG Sending revalidation request for: https://pypi.org/simple/llguidance/ 2025-09-07T09:15:13.8183728Z #43 0.484 DEBUG Found stale response for: https://pypi.org/simple/outlines-core/ 2025-09-07T09:15:13.8184983Z #43 0.484 DEBUG Sending revalidation request for: https://pypi.org/simple/outlines-core/ 2025-09-07T09:15:13.8186399Z #43 0.484 DEBUG Found stale response for: https://pypi.org/simple/diskcache/ 2025-09-07T09:15:13.8187658Z #43 0.484 DEBUG Sending revalidation request for: https://pypi.org/simple/diskcache/ 2025-09-07T09:15:13.8188949Z #43 0.484 DEBUG Found stale response for: https://pypi.org/simple/lark/ 2025-09-07T09:15:13.8190144Z #43 0.484 DEBUG Sending revalidation request for: https://pypi.org/simple/lark/ 2025-09-07T09:15:13.8191479Z #43 0.484 DEBUG Found stale response for: https://pypi.org/simple/xgrammar/ 2025-09-07T09:15:13.8192917Z #43 0.484 DEBUG Sending revalidation request for: https://pypi.org/simple/xgrammar/ 2025-09-07T09:15:13.8194195Z #43 0.484 DEBUG Found stale response for: https://pypi.org/simple/typing-extensions/ 2025-09-07T09:15:13.8195558Z #43 0.484 DEBUG Sending revalidation request for: https://pypi.org/simple/typing-extensions/ 2025-09-07T09:15:13.8196834Z #43 0.484 DEBUG Found stale response for: https://pypi.org/simple/filelock/ 2025-09-07T09:15:13.8198234Z #43 0.484 DEBUG Sending revalidation request for: https://pypi.org/simple/filelock/ 2025-09-07T09:15:13.8199735Z #43 0.484 DEBUG Found stale response for: https://pypi.org/simple/partial-json-parser/ 2025-09-07T09:15:13.8201189Z #43 0.484 DEBUG Sending revalidation request for: https://pypi.org/simple/partial-json-parser/ 2025-09-07T09:15:13.8202503Z #43 0.484 DEBUG Found stale response for: https://pypi.org/simple/msgspec/ 2025-09-07T09:15:13.8203814Z #43 0.484 DEBUG Sending revalidation request for: https://pypi.org/simple/msgspec/ 2025-09-07T09:15:13.8204954Z #43 0.484 DEBUG Found stale response for: https://pypi.org/simple/gguf/ 2025-09-07T09:15:13.8206109Z #43 0.484 DEBUG Sending revalidation request for: https://pypi.org/simple/gguf/ 2025-09-07T09:15:13.8207328Z #43 0.484 DEBUG Found stale response for: https://pypi.org/simple/mistral-common/ 2025-09-07T09:15:13.8208654Z #43 0.484 DEBUG Sending revalidation request for: https://pypi.org/simple/mistral-common/ 2025-09-07T09:15:13.8210235Z #43 0.484 DEBUG Found stale response for: https://pypi.org/simple/opencv-python-headless/ 2025-09-07T09:15:13.8211712Z #43 0.484 DEBUG Sending revalidation request for: https://pypi.org/simple/opencv-python-headless/ 2025-09-07T09:15:13.8213410Z #43 0.484 DEBUG Found stale response for: https://pypi.org/simple/pyyaml/ 2025-09-07T09:15:13.8214646Z #43 0.484 DEBUG Sending revalidation request for: https://pypi.org/simple/pyyaml/ 2025-09-07T09:15:13.8215846Z #43 0.484 DEBUG Found stale response for: https://pypi.org/simple/six/ 2025-09-07T09:15:13.8217031Z #43 0.484 DEBUG Sending revalidation request for: https://pypi.org/simple/six/ 2025-09-07T09:15:13.8218296Z #43 0.484 DEBUG Found stale response for: https://pypi.org/simple/setuptools/ 2025-09-07T09:15:13.8219591Z #43 0.484 DEBUG Sending revalidation request for: https://pypi.org/simple/setuptools/ 2025-09-07T09:15:13.8220886Z #43 0.484 DEBUG Found stale response for: https://pypi.org/simple/einops/ 2025-09-07T09:15:13.8222189Z #43 0.484 DEBUG Sending revalidation request for: https://pypi.org/simple/einops/ 2025-09-07T09:15:13.8223546Z #43 0.485 DEBUG Found stale response for: https://pypi.org/simple/compressed-tensors/ 2025-09-07T09:15:13.8225052Z #43 0.485 DEBUG Sending revalidation request for: https://pypi.org/simple/compressed-tensors/ 2025-09-07T09:15:13.8226469Z #43 0.485 DEBUG Found stale response for: https://pypi.org/simple/depyf/ 2025-09-07T09:15:13.8227632Z #43 0.485 DEBUG Sending revalidation request for: https://pypi.org/simple/depyf/ 2025-09-07T09:15:13.8228782Z #43 0.485 DEBUG Found stale response for: https://pypi.org/simple/cloudpickle/ 2025-09-07T09:15:13.8232430Z #43 0.485 DEBUG Sending revalidation request for: https://pypi.org/simple/cloudpickle/ 2025-09-07T09:15:13.8233644Z #43 0.485 DEBUG Found stale response for: https://pypi.org/simple/watchfiles/ 2025-09-07T09:15:13.8234888Z #43 0.485 DEBUG Sending revalidation request for: https://pypi.org/simple/watchfiles/ 2025-09-07T09:15:13.8236203Z #43 0.485 DEBUG Found stale response for: https://pypi.org/simple/python-json-logger/ 2025-09-07T09:15:13.8237535Z #43 0.485 DEBUG Sending revalidation request for: https://pypi.org/simple/python-json-logger/ 2025-09-07T09:15:13.8238803Z #43 0.485 DEBUG Found stale response for: https://pypi.org/simple/scipy/ 2025-09-07T09:15:13.8239920Z #43 0.485 DEBUG Sending revalidation request for: https://pypi.org/simple/scipy/ 2025-09-07T09:15:13.8241052Z #43 0.485 DEBUG Found stale response for: https://pypi.org/simple/ninja/ 2025-09-07T09:15:13.8242283Z #43 0.485 DEBUG Sending revalidation request for: https://pypi.org/simple/ninja/ 2025-09-07T09:15:13.8243122Z #43 0.485 DEBUG Found stale response for: https://pypi.org/simple/pybase64/ 2025-09-07T09:15:13.8243792Z #43 0.485 DEBUG Sending revalidation request for: https://pypi.org/simple/pybase64/ 2025-09-07T09:15:13.8244436Z #43 0.485 DEBUG Found stale response for: https://pypi.org/simple/cbor2/ 2025-09-07T09:15:13.8245067Z #43 0.485 DEBUG Sending revalidation request for: https://pypi.org/simple/cbor2/ 2025-09-07T09:15:13.8245778Z #43 0.485 DEBUG Found stale response for: https://pypi.org/simple/setproctitle/ 2025-09-07T09:15:13.8246476Z #43 0.485 DEBUG Sending revalidation request for: https://pypi.org/simple/setproctitle/ 2025-09-07T09:15:13.8247187Z #43 0.485 DEBUG Found stale response for: https://pypi.org/simple/openai-harmony/ 2025-09-07T09:15:13.8247894Z #43 0.485 DEBUG Sending revalidation request for: https://pypi.org/simple/openai-harmony/ 2025-09-07T09:15:13.8248579Z #43 0.485 DEBUG Found stale response for: https://pypi.org/simple/numba/ 2025-09-07T09:15:13.8249202Z #43 0.485 DEBUG Sending revalidation request for: https://pypi.org/simple/numba/ 2025-09-07T09:15:13.8249842Z #43 0.485 DEBUG Found stale response for: https://pypi.org/simple/numpy/ 2025-09-07T09:15:13.8250462Z #43 0.485 DEBUG Sending revalidation request for: https://pypi.org/simple/numpy/ 2025-09-07T09:15:13.8251120Z #43 0.486 DEBUG Found stale response for: https://pypi.org/simple/tokenizers/ 2025-09-07T09:15:13.8251852Z #43 0.486 DEBUG Sending revalidation request for: https://pypi.org/simple/tokenizers/ 2025-09-07T09:15:13.8252510Z #43 0.486 DEBUG Found stale response for: https://pypi.org/simple/pillow/ 2025-09-07T09:15:13.8253424Z #43 0.486 DEBUG Sending revalidation request for: https://pypi.org/simple/pillow/ 2025-09-07T09:15:13.8254091Z #43 0.486 DEBUG Found stale response for: https://pypi.org/simple/protobuf/ 2025-09-07T09:15:13.8254783Z #43 0.486 DEBUG Sending revalidation request for: https://pypi.org/simple/protobuf/ 2025-09-07T09:15:13.8255460Z #43 0.488 DEBUG Found stale response for: https://pypi.org/simple/regex/ 2025-09-07T09:15:13.8256107Z #43 0.488 DEBUG Sending revalidation request for: https://pypi.org/simple/regex/ 2025-09-07T09:15:13.8256756Z #43 0.489 DEBUG Found stale response for: https://pypi.org/simple/pyzmq/ 2025-09-07T09:15:13.8257394Z #43 0.489 DEBUG Sending revalidation request for: https://pypi.org/simple/pyzmq/ 2025-09-07T09:15:13.8258085Z #43 0.504 DEBUG Found not-modified response for: https://pypi.org/simple/pyzmq/ 2025-09-07T09:15:13.8258784Z #43 0.508 DEBUG Found not-modified response for: https://pypi.org/simple/cachetools/ 2025-09-07T09:15:13.8259499Z #43 0.508 DEBUG Found not-modified response for: https://pypi.org/simple/psutil/ 2025-09-07T09:15:13.8260221Z #43 0.509 DEBUG Found not-modified response for: https://pypi.org/simple/sentencepiece/ 2025-09-07T09:15:13.8260951Z #43 0.510 DEBUG Found not-modified response for: https://pypi.org/simple/requests/ 2025-09-07T09:15:13.8261643Z #43 0.510 DEBUG Found not-modified response for: https://pypi.org/simple/tqdm/ 2025-09-07T09:15:13.8262307Z #43 0.510 DEBUG Found not-modified response for: https://pypi.org/simple/blake3/ 2025-09-07T09:15:13.8263067Z #43 0.511 DEBUG Found not-modified response for: https://pypi.org/simple/py-cpuinfo/ 2025-09-07T09:15:13.8263809Z #43 0.511 DEBUG Found not-modified response for: https://pypi.org/simple/transformers/ 2025-09-07T09:15:13.8264524Z #43 0.511 DEBUG Found not-modified response for: https://pypi.org/simple/fastapi/ 2025-09-07T09:15:13.8265377Z #43 0.511 DEBUG Found not-modified response for: https://pypi.org/simple/openai/ 2025-09-07T09:15:13.8266047Z #43 0.512 DEBUG Found not-modified response for: https://pypi.org/simple/pydantic/ 2025-09-07T09:15:13.8266788Z #43 0.513 DEBUG Found not-modified response for: https://pypi.org/simple/prometheus-client/ 2025-09-07T09:15:13.8267477Z #43 0.513 DEBUG Found stale response for: https://pypi.org/simple/aiohttp/ 2025-09-07T09:15:13.8268169Z #43 0.513 DEBUG Sending revalidation request for: https://pypi.org/simple/aiohttp/ 2025-09-07T09:15:13.8268986Z #43 0.513 DEBUG Found not-modified response for: https://pypi.org/simple/prometheus-fastapi-instrumentator/ 2025-09-07T09:15:13.8269793Z #43 0.513 DEBUG Found not-modified response for: https://pypi.org/simple/tiktoken/ 2025-09-07T09:15:13.8270531Z #43 0.513 DEBUG Found not-modified response for: https://pypi.org/simple/lm-format-enforcer/ 2025-09-07T09:15:13.8271262Z #43 0.514 DEBUG Found not-modified response for: https://pypi.org/simple/llguidance/ 2025-09-07T09:15:13.8272025Z #43 0.514 DEBUG Found not-modified response for: https://pypi.org/simple/outlines-core/ 2025-09-07T09:15:13.8272748Z #43 0.514 DEBUG Found not-modified response for: https://pypi.org/simple/diskcache/ 2025-09-07T09:15:13.8273414Z #43 0.514 DEBUG Found not-modified response for: https://pypi.org/simple/lark/ 2025-09-07T09:15:13.8274088Z #43 0.515 DEBUG Found not-modified response for: https://pypi.org/simple/xgrammar/ 2025-09-07T09:15:13.8274812Z #43 0.515 DEBUG Found not-modified response for: https://pypi.org/simple/typing-extensions/ 2025-09-07T09:15:13.8275546Z #43 0.515 DEBUG Found not-modified response for: https://pypi.org/simple/filelock/ 2025-09-07T09:15:13.8276273Z #43 0.515 DEBUG Found not-modified response for: https://pypi.org/simple/partial-json-parser/ 2025-09-07T09:15:13.8277008Z #43 0.515 DEBUG Found not-modified response for: https://pypi.org/simple/msgspec/ 2025-09-07T09:15:13.8277669Z #43 0.516 DEBUG Found not-modified response for: https://pypi.org/simple/gguf/ 2025-09-07T09:15:13.8278382Z #43 0.516 DEBUG Found not-modified response for: https://pypi.org/simple/mistral-common/ 2025-09-07T09:15:13.8279166Z #43 0.516 DEBUG Found not-modified response for: https://pypi.org/simple/opencv-python-headless/ 2025-09-07T09:15:13.8279894Z #43 0.517 DEBUG Found not-modified response for: https://pypi.org/simple/pyyaml/ 2025-09-07T09:15:13.8280547Z #43 0.517 DEBUG Found not-modified response for: https://pypi.org/simple/six/ 2025-09-07T09:15:13.8281223Z #43 0.518 DEBUG Found not-modified response for: https://pypi.org/simple/setuptools/ 2025-09-07T09:15:13.8281904Z #43 0.519 DEBUG Found not-modified response for: https://pypi.org/simple/einops/ 2025-09-07T09:15:13.8282633Z #43 0.519 DEBUG Found not-modified response for: https://pypi.org/simple/compressed-tensors/ 2025-09-07T09:15:13.8283339Z #43 0.519 DEBUG Found not-modified response for: https://pypi.org/simple/depyf/ 2025-09-07T09:15:13.8284029Z #43 0.519 DEBUG Found not-modified response for: https://pypi.org/simple/cloudpickle/ 2025-09-07T09:15:13.8284738Z #43 0.519 DEBUG Found not-modified response for: https://pypi.org/simple/watchfiles/ 2025-09-07T09:15:13.8285481Z #43 0.520 DEBUG Found not-modified response for: https://pypi.org/simple/python-json-logger/ 2025-09-07T09:15:13.8286198Z #43 0.520 DEBUG Found not-modified response for: https://pypi.org/simple/scipy/ 2025-09-07T09:15:13.8286839Z #43 0.521 DEBUG Found not-modified response for: https://pypi.org/simple/ninja/ 2025-09-07T09:15:13.8287516Z #43 0.522 DEBUG Found not-modified response for: https://pypi.org/simple/pybase64/ 2025-09-07T09:15:13.8288213Z #43 0.522 DEBUG Found not-modified response for: https://pypi.org/simple/cbor2/ 2025-09-07T09:15:13.8288903Z #43 0.523 DEBUG Found not-modified response for: https://pypi.org/simple/setproctitle/ 2025-09-07T09:15:13.8289633Z #43 0.523 DEBUG Found not-modified response for: https://pypi.org/simple/openai-harmony/ 2025-09-07T09:15:13.8290325Z #43 0.523 DEBUG Found not-modified response for: https://pypi.org/simple/numba/ 2025-09-07T09:15:13.8290987Z #43 0.524 DEBUG Found not-modified response for: https://pypi.org/simple/numpy/ 2025-09-07T09:15:13.8291660Z #43 0.527 DEBUG Found not-modified response for: https://pypi.org/simple/tokenizers/ 2025-09-07T09:15:13.8292810Z #43 0.528 DEBUG Found not-modified response for: https://pypi.org/simple/pillow/ 2025-09-07T09:15:13.8293503Z #43 0.529 DEBUG Found not-modified response for: https://pypi.org/simple/protobuf/ 2025-09-07T09:15:13.8294296Z #43 0.530 DEBUG Found not-modified response for: https://pypi.org/simple/regex/ 2025-09-07T09:15:13.8294990Z #43 0.538 DEBUG Found not-modified response for: https://pypi.org/simple/aiohttp/ 2025-09-07T09:15:13.8295724Z #43 0.544 DEBUG Searching for a compatible version of lm-format-enforcer (>=0.11.3, <0.11.3+) 2025-09-07T09:15:13.8296594Z #43 0.544 DEBUG Selecting: lm-format-enforcer==0.11.3 [compatible] (lm_format_enforcer-0.11.3-py3-none-any.whl) 2025-09-07T09:15:13.8297790Z #43 0.556 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.10 2025-09-07T09:15:13.9096663Z #43 0.556 DEBUG Found installed version of filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) that satisfies >=3.16.1 2025-09-07T09:15:13.9098134Z #43 0.557 DEBUG Found installed version of numpy==2.3.2 (from file:///dist/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies * 2025-09-07T09:15:13.9099756Z #43 0.557 DEBUG Found installed version of pillow==11.3.0 (from file:///dist/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies * 2025-09-07T09:15:13.9100855Z #43 0.558 DEBUG No cache entry for: https://pypi.org/simple/ray/ 2025-09-07T09:15:13.9102428Z #43 0.558 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/0e/66/d781ab0636570d32c745c4e389b1c6b713115905cca69ab6233508622edd/pyzmq-27.0.2-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T09:15:13.9105102Z #43 0.559 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/6c/56/3124f61d37a7a4e7cc96afc5492c78ba0cb551151e530b54669ddd1436ef/cachetools-6.2.0-py3-none-any.whl.metadata 2025-09-07T09:15:13.9107588Z #43 0.559 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/bf/b9/b0eb3f3cbcb734d930fdf839431606844a825b23eaf9a6ab371edac8162c/psutil-7.0.0-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9110337Z #43 0.559 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/04/88/14f2f4a2b922d8b39be45bf63d79e6cd3a9b2f248b2fcb98a69b12af12f5/sentencepiece-0.2.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T09:15:13.9112661Z #43 0.559 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/1e/db/4254e3eabe8020b458f1a747140d32277ec7a271daf1d235b70dc0b4e6e3/requests-2.32.5-py3-none-any.whl.metadata 2025-09-07T09:15:13.9114661Z #43 0.559 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/d0/30/dc54f88dd4a2b5dc8a0279bdd7270e735851848b762aeb1c1184ed1f6b14/tqdm-4.67.1-py3-none-any.whl.metadata 2025-09-07T09:15:13.9116699Z #43 0.559 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/00/e1/47887212baa7bc0532880d33d5eafbdb46fcc4b53789b903282a74a85b5b/openai-1.106.1-py3-none-any.whl.metadata 2025-09-07T09:15:13.9118793Z #43 0.559 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/71/7c/283c3dd35e00e22a7803a0b2a65251347b745474a82399be058bde1c9f15/transformers-4.56.1-py3-none-any.whl.metadata 2025-09-07T09:15:13.9120972Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/6a/c0/ec2b1c8712ca690e5d61979dee872603e92b8a32f94cc1b72d53beab008a/pydantic-2.11.7-py3-none-any.whl.metadata 2025-09-07T09:15:13.9123239Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/5c/04/a86bfb3c20e859e43ead0b13be59afd98feb166ea929e76fa3d190f65f6e/blake3-1.0.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9125490Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/e0/a9/023730ba63db1e494a271cb018dcd361bd2c917ba7004c3e49d5daf795a2/py_cpuinfo-9.0.0-py3-none-any.whl.metadata 2025-09-07T09:15:13.9127570Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/32/ae/ec06af4fe3ee72d16973474f122541746196aaa16cea6f66d18b963c6177/prometheus_client-0.22.1-py3-none-any.whl.metadata 2025-09-07T09:15:13.9129802Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/27/72/0824c18f3bc75810f55dacc2dd933f6ec829771180245ae3cc976195dec0/prometheus_fastapi_instrumentator-7.1.0-py3-none-any.whl.metadata 2025-09-07T09:15:13.9132078Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/d5/2d/4d77f6feb9292bfdd23d5813e442b3bba883f42d0ac78ef5fdc56873f756/tiktoken-0.11.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9134633Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/a0/ef/11292bb0b85cf4c93447cab5a29f64576ed14d3ab4280e35ddd23486594a/lm_format_enforcer-0.11.3-py3-none-any.whl.metadata 2025-09-07T09:15:13.9136689Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/3f/27/4570e78fc0bf5ea0ca45eb1de3818a23787af9b390c0b0a0033a1b8236f9/diskcache-5.6.3-py3-none-any.whl.metadata 2025-09-07T09:15:13.9138575Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/2d/00/d90b10b962b4277f5e64a78b6609968859ff86889f5b898c1a778c06ec00/lark-1.2.2-py3-none-any.whl.metadata 2025-09-07T09:15:13.9140563Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/cb/40/1f922794af3dc7503f19319a8804b398a161a2cd54183cff8b12225b8d85/partial_json_parser-0.2.1.1.post6-py3-none-any.whl.metadata 2025-09-07T09:15:13.9142048Z #43 0.560 DEBUG Adding transitive dependency for lm-format-enforcer==0.11.3: interegular>=0.3.2 2025-09-07T09:15:13.9143530Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/fc/31/6a93a887617ee7deeaa602ca3d02d1c12a6cb8a742a695de5d128f5fa46a/gguf-0.17.1-py3-none-any.whl.metadata 2025-09-07T09:15:13.9144855Z #43 0.560 DEBUG Adding transitive dependency for lm-format-enforcer==0.11.3: packaging* 2025-09-07T09:15:13.9145716Z #43 0.560 DEBUG Adding transitive dependency for lm-format-enforcer==0.11.3: pydantic>=1.10.8 2025-09-07T09:15:13.9146457Z #43 0.560 DEBUG Adding transitive dependency for lm-format-enforcer==0.11.3: pyyaml* 2025-09-07T09:15:13.9147964Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/d0/ef/c5422ce8af73928d194a6606f8ae36e93a52fd5e8df5abd366903a5ca8da/msgspec-0.19.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9149554Z #43 0.560 DEBUG Searching for a compatible version of outlines-core{platform_machine != 's390x'} (>=0.2.10, <0.2.10+) 2025-09-07T09:15:13.9151243Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/89/53/e19c21e0c4eb1275c3e2c97b081103b6dfb3938172264d283a519bf728b9/opencv_python_headless-4.12.0.88-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata 2025-09-07T09:15:13.9153558Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/b9/2b/614b4752f2e127db5cc206abc23a8c19678e92b23c3db30fc86ab731d3bd/PyYAML-6.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9155241Z #43 0.560 DEBUG Selecting: outlines-core==0.2.10 [compatible] (outlines_core-0.2.10-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9156860Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/87/62/9773de14fe6c45c23649e98b83231fffd7b9892b6cf863251dc2afa73643/einops-0.8.1-py3-none-any.whl.metadata 2025-09-07T09:15:13.9158755Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/d2/81/e3073017a8f5c75169e79108eda209e6089e3f96c9f197d307cbda7df71c/compressed_tensors-0.11.0-py3-none-any.whl.metadata 2025-09-07T09:15:13.9160132Z #43 0.560 DEBUG Adding transitive dependency for outlines-core==0.2.10: outlines-core==0.2.10 2025-09-07T09:15:13.9161031Z #43 0.560 DEBUG Adding transitive dependency for outlines-core==0.2.10: outlines-core{platform_machine != 's390x'}==0.2.10 2025-09-07T09:15:13.9162531Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/7e/e8/64c37fadfc2816a7701fa8a6ed8d87327c7d54eacfbfb6edab14a2f2be75/cloudpickle-3.1.1-py3-none-any.whl.metadata 2025-09-07T09:15:13.9164424Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/28/4d/1192acbcdc5e843f5e5d51f6e8788f2b60a9fe0b578ac385ded67a0b0b26/depyf-0.19.0-py3-none-any.whl.metadata 2025-09-07T09:15:13.9165672Z #43 0.560 DEBUG Searching for a compatible version of outlines-core (==0.2.10) 2025-09-07T09:15:13.9167153Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/8c/77/e3362fe308358dc9f8588102481e599c83e1b91c2ae843780a7ded939a35/watchfiles-1.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9168870Z #43 0.560 DEBUG Selecting: outlines-core==0.2.10 [compatible] (outlines_core-0.2.10-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9170435Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/08/20/0f2523b9e50a8052bc6a8b732dfc8568abbdc42010aef03a2d750bdab3b2/python_json_logger-3.3.0-py3-none-any.whl.metadata 2025-09-07T09:15:13.9172526Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/51/1e/79023ca3bbb13a015d7d2757ecca3b81293c663694c35d6541b4dca53e98/scipy-1.16.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata 2025-09-07T09:15:13.9175113Z #43 0.560 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/ed/de/0e6edf44d6a04dabd0318a519125ed0415ce437ad5a1ec9b9be03d9048cf/ninja-1.13.0-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata 2025-09-07T09:15:13.9177426Z #43 0.561 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/de/5e/3bf5acea47a96a28c121b167f5ef659cf71208b19e52a88cdfa5c37f1fcc/aiohttp-3.12.15-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9179841Z #43 0.561 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/b2/02/5c891bb5fe0691cc1bad336e3a94b9097fbcf9707ec8ddc1dce9f0397289/regex-2025.9.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T09:15:13.9182150Z #43 0.561 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/40/01/2e730bd1c25392fc32e3268e02446f0d77cb51a2c3a8486b1798e34d5805/protobuf-6.32.0-cp39-abi3-manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9184903Z #43 0.561 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/d4/61/aeab3402c26874b74bb67a7f2c4b569dde29b51032c5384db592e7b216f4/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9187187Z #43 0.561 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/1f/ec/dcdcace0ffcf3a532cca910e0c351b62d3a7decf0b091ea8cf856d2a67a6/openai_harmony-0.0.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9189573Z #43 0.561 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/d0/99/71630546b9395b095f4082be41165d1078204d1696c2d9baade3de3202d0/setproctitle-1.3.7-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl.metadata 2025-09-07T09:15:13.9192206Z #43 0.561 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/a8/6e/3499eaa2b858c7695a447b6311303f06ffc90fc2c45851337121661f1f5c/cbor2-5.7.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T09:15:13.9194946Z #43 0.561 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/ee/87/d9baf98cbfc37b8657290ad4421f3a3c36aa0eafe4872c5859cfb52f3448/pybase64-1.4.2-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.metadata 2025-09-07T09:15:13.9196602Z #43 0.561 DEBUG Found stale response for: https://pypi.org/simple/interegular/ 2025-09-07T09:15:13.9197325Z #43 0.561 DEBUG Sending revalidation request for: https://pypi.org/simple/interegular/ 2025-09-07T09:15:13.9198098Z #43 0.561 DEBUG Found stale response for: https://pypi.org/simple/packaging/ 2025-09-07T09:15:13.9198792Z #43 0.561 DEBUG Sending revalidation request for: https://pypi.org/simple/packaging/ 2025-09-07T09:15:13.9200358Z #43 0.561 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/c2/83/db792ce386d1c13d875a03d6ff5ba31612cfb558ecf5b945910db9505574/outlines_core-0.2.10-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9201982Z #43 0.561 DEBUG Searching for a compatible version of outlines-core{platform_machine != 's390x'} (==0.2.10) 2025-09-07T09:15:13.9203085Z #43 0.561 DEBUG Selecting: outlines-core==0.2.10 [compatible] (outlines_core-0.2.10-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9204013Z #43 0.561 DEBUG Searching for a compatible version of diskcache (>=5.6.3, <5.6.3+) 2025-09-07T09:15:13.9204737Z #43 0.561 DEBUG Selecting: diskcache==5.6.3 [compatible] (diskcache-5.6.3-py3-none-any.whl) 2025-09-07T09:15:13.9205532Z #43 0.561 DEBUG Searching for a compatible version of lark (>=1.2.2, <1.2.2+) 2025-09-07T09:15:13.9206142Z #43 0.561 DEBUG Selecting: lark==1.2.2 [compatible] (lark-1.2.2-py3-none-any.whl) 2025-09-07T09:15:13.9207170Z #43 0.561 DEBUG Searching for a compatible version of xgrammar{platform_machine == 'aarch64' or platform_machine == 'arm64' or platform_machine == 'x86_64'} (>=0.1.23, <0.1.23+) 2025-09-07T09:15:13.9208405Z #43 0.561 DEBUG Selecting: xgrammar==0.1.23 [compatible] (xgrammar-0.1.23-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9209333Z #43 0.561 DEBUG Adding transitive dependency for xgrammar==0.1.23: xgrammar==0.1.23 2025-09-07T09:15:13.9210398Z #43 0.561 DEBUG Adding transitive dependency for xgrammar==0.1.23: xgrammar{platform_machine == 'aarch64' or platform_machine == 'arm64' or platform_machine == 'x86_64'}==0.1.23 2025-09-07T09:15:13.9211389Z #43 0.561 DEBUG Searching for a compatible version of xgrammar (==0.1.23) 2025-09-07T09:15:13.9212228Z #43 0.561 DEBUG Selecting: xgrammar==0.1.23 [compatible] (xgrammar-0.1.23-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9214175Z #43 0.561 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/a1/13/53d950b93a361ef73e5930050916fa36c23fade80ee05cfb0339c044e951/xgrammar-0.1.23-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9215656Z #43 0.562 DEBUG Adding transitive dependency for xgrammar==0.1.23: pydantic* 2025-09-07T09:15:13.9216316Z #43 0.562 DEBUG Adding transitive dependency for xgrammar==0.1.23: torch>=1.10.0 2025-09-07T09:15:13.9217018Z #43 0.562 DEBUG Adding transitive dependency for xgrammar==0.1.23: transformers>=4.38.0 2025-09-07T09:15:13.9217951Z #43 0.562 DEBUG Adding transitive dependency for xgrammar==0.1.23: triton{platform_machine == 'x86_64' and sys_platform == 'linux'}* 2025-09-07T09:15:13.9218812Z #43 0.562 DEBUG Adding transitive dependency for xgrammar==0.1.23: ninja* 2025-09-07T09:15:13.9219416Z #43 0.562 DEBUG Adding transitive dependency for xgrammar==0.1.23: numpy* 2025-09-07T09:15:13.9220118Z #43 0.562 DEBUG Adding transitive dependency for xgrammar==0.1.23: typing-extensions>=4.9.0 2025-09-07T09:15:13.9221253Z #43 0.562 DEBUG Searching for a compatible version of xgrammar{platform_machine == 'aarch64' or platform_machine == 'arm64' or platform_machine == 'x86_64'} (==0.1.23) 2025-09-07T09:15:13.9222499Z #43 0.562 DEBUG Selecting: xgrammar==0.1.23 [compatible] (xgrammar-0.1.23-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9223373Z #43 0.562 DEBUG Adding transitive dependency for xgrammar==0.1.23: pydantic* 2025-09-07T09:15:13.9224019Z #43 0.562 DEBUG Adding transitive dependency for xgrammar==0.1.23: torch>=1.10.0 2025-09-07T09:15:13.9224820Z #43 0.562 DEBUG Adding transitive dependency for xgrammar==0.1.23: transformers>=4.38.0 2025-09-07T09:15:13.9225704Z #43 0.562 DEBUG Adding transitive dependency for xgrammar==0.1.23: triton{platform_machine == 'x86_64' and sys_platform == 'linux'}* 2025-09-07T09:15:13.9226584Z #43 0.562 DEBUG Adding transitive dependency for xgrammar==0.1.23: ninja* 2025-09-07T09:15:13.9227173Z #43 0.562 DEBUG Adding transitive dependency for xgrammar==0.1.23: numpy* 2025-09-07T09:15:13.9227838Z #43 0.562 DEBUG Adding transitive dependency for xgrammar==0.1.23: typing-extensions>=4.9.0 2025-09-07T09:15:13.9228596Z #43 0.562 DEBUG Searching for a compatible version of compressed-tensors (>=0.11.0, <0.11.0+) 2025-09-07T09:15:13.9229425Z #43 0.562 DEBUG Selecting: compressed-tensors==0.11.0 [compatible] (compressed_tensors-0.11.0-py3-none-any.whl) 2025-09-07T09:15:13.9230298Z #43 0.562 DEBUG Adding transitive dependency for compressed-tensors==0.11.0: torch>=1.7.0 2025-09-07T09:15:13.9231036Z #43 0.562 DEBUG Adding transitive dependency for compressed-tensors==0.11.0: transformers* 2025-09-07T09:15:13.9231784Z #43 0.562 DEBUG Adding transitive dependency for compressed-tensors==0.11.0: pydantic>=2.0 2025-09-07T09:15:13.9232526Z #43 0.562 DEBUG Adding transitive dependency for compressed-tensors==0.11.0: frozendict* 2025-09-07T09:15:13.9233198Z #43 0.562 DEBUG Searching for a compatible version of depyf (>=0.19.0, <0.19.0+) 2025-09-07T09:15:13.9233857Z #43 0.562 DEBUG Selecting: depyf==0.19.0 [compatible] (depyf-0.19.0-py3-none-any.whl) 2025-09-07T09:15:13.9234556Z #43 0.562 DEBUG Adding transitive dependency for depyf==0.19.0: astor* 2025-09-07T09:15:13.9235119Z #43 0.562 DEBUG Adding transitive dependency for depyf==0.19.0: dill* 2025-09-07T09:15:13.9235878Z #43 0.562 DEBUG Searching for a compatible version of numba{python_full_version >= '3.10'} (>=0.61.2, <0.61.2+) 2025-09-07T09:15:13.9236814Z #43 0.562 DEBUG Selecting: numba==0.61.2 [compatible] (numba-0.61.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:13.9237626Z #43 0.562 DEBUG Found stale response for: https://pypi.org/simple/triton/ 2025-09-07T09:15:13.9238262Z #43 0.562 DEBUG Sending revalidation request for: https://pypi.org/simple/triton/ 2025-09-07T09:15:13.9238915Z #43 0.562 DEBUG Adding transitive dependency for numba==0.61.2: numba==0.61.2 2025-09-07T09:15:13.9239640Z #43 0.562 DEBUG Adding transitive dependency for numba==0.61.2: numba{python_full_version >= '3.10'}==0.61.2 2025-09-07T09:15:13.9240359Z #43 0.562 DEBUG Searching for a compatible version of numba (==0.61.2) 2025-09-07T09:15:13.9241144Z #43 0.562 DEBUG Selecting: numba==0.61.2 [compatible] (numba-0.61.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:13.9241931Z #43 0.562 DEBUG Found stale response for: https://pypi.org/simple/torch/ 2025-09-07T09:15:13.9242571Z #43 0.562 DEBUG Sending revalidation request for: https://pypi.org/simple/torch/ 2025-09-07T09:15:13.9243195Z #43 0.562 DEBUG Found stale response for: https://pypi.org/simple/astor/ 2025-09-07T09:15:13.9243828Z #43 0.562 DEBUG Sending revalidation request for: https://pypi.org/simple/astor/ 2025-09-07T09:15:13.9244458Z #43 0.562 DEBUG Found stale response for: https://pypi.org/simple/dill/ 2025-09-07T09:15:13.9245071Z #43 0.562 DEBUG Sending revalidation request for: https://pypi.org/simple/dill/ 2025-09-07T09:15:13.9245727Z #43 0.563 DEBUG Found stale response for: https://pypi.org/simple/frozendict/ 2025-09-07T09:15:13.9246436Z #43 0.563 DEBUG Sending revalidation request for: https://pypi.org/simple/frozendict/ 2025-09-07T09:15:13.9247929Z #43 0.563 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/9a/2d/e518df036feab381c23a624dac47f8445ac55686ec7f11083655eb707da3/numba-0.61.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata 2025-09-07T09:15:13.9249422Z #43 0.563 DEBUG Adding transitive dependency for numba==0.61.2: llvmlite>=0.44.0.dev0, <0.45 2025-09-07T09:15:13.9250110Z #43 0.563 DEBUG Adding transitive dependency for numba==0.61.2: numpy>=1.24, <2.3 2025-09-07T09:15:13.9250841Z #43 0.563 DEBUG Searching for a compatible version of numba{python_full_version >= '3.10'} (==0.61.2) 2025-09-07T09:15:13.9251736Z #43 0.563 DEBUG Selecting: numba==0.61.2 [compatible] (numba-0.61.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:13.9252648Z #43 0.563 DEBUG Adding transitive dependency for numba==0.61.2: llvmlite>=0.44.0.dev0, <0.45 2025-09-07T09:15:13.9253440Z #43 0.563 DEBUG Adding transitive dependency for numba==0.61.2: numpy>=1.24, <2.3 2025-09-07T09:15:13.9254222Z #43 0.563 DEBUG Searching for a compatible version of regex (*) 2025-09-07T09:15:13.9255139Z #43 0.563 DEBUG Selecting: regex==2025.9.1 [compatible] (regex-2025.9.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:15:13.9256113Z #43 0.563 DEBUG Searching for a compatible version of cachetools (*) 2025-09-07T09:15:13.9256799Z #43 0.563 DEBUG Selecting: cachetools==6.2.0 [compatible] (cachetools-6.2.0-py3-none-any.whl) 2025-09-07T09:15:13.9257469Z #43 0.563 DEBUG Searching for a compatible version of psutil (*) 2025-09-07T09:15:13.9258433Z #43 0.563 DEBUG Selecting: psutil==7.0.0 [compatible] (psutil-7.0.0-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9272816Z #43 0.563 DEBUG Searching for a compatible version of sentencepiece (*) 2025-09-07T09:15:13.9273764Z #43 0.563 DEBUG Selecting: sentencepiece==0.2.1 [compatible] (sentencepiece-0.2.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:15:13.9274658Z #43 0.563 DEBUG Searching for a compatible version of numpy (>=1.24, <2.3) 2025-09-07T09:15:13.9275459Z #43 0.563 DEBUG Selecting: numpy==2.2.6 [compatible] (numpy-2.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9277182Z #43 0.563 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/8c/3d/1e1db36cfd41f895d266b103df00ca5b3cbe965184df824dec5c08c6b803/numpy-2.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9278601Z #43 0.563 DEBUG Searching for a compatible version of requests (>=2.26.0) 2025-09-07T09:15:13.9279275Z #43 0.563 DEBUG Selecting: requests==2.32.5 [compatible] (requests-2.32.5-py3-none-any.whl) 2025-09-07T09:15:13.9280025Z #43 0.563 DEBUG Adding transitive dependency for requests==2.32.5: charset-normalizer>=2, <4 2025-09-07T09:15:13.9280737Z #43 0.563 DEBUG Adding transitive dependency for requests==2.32.5: idna>=2.5, <4 2025-09-07T09:15:13.9281401Z #43 0.563 DEBUG Adding transitive dependency for requests==2.32.5: urllib3>=1.21.1, <3 2025-09-07T09:15:13.9282099Z #43 0.563 DEBUG Adding transitive dependency for requests==2.32.5: certifi>=2017.4.17 2025-09-07T09:15:13.9282709Z #43 0.563 DEBUG Searching for a compatible version of tqdm (*) 2025-09-07T09:15:13.9283277Z #43 0.563 DEBUG Selecting: tqdm==4.67.1 [compatible] (tqdm-4.67.1-py3-none-any.whl) 2025-09-07T09:15:13.9283874Z #43 0.563 DEBUG Searching for a compatible version of blake3 (*) 2025-09-07T09:15:13.9284632Z #43 0.563 DEBUG Selecting: blake3==1.0.5 [compatible] (blake3-1.0.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9285428Z #43 0.563 DEBUG Searching for a compatible version of py-cpuinfo (*) 2025-09-07T09:15:13.9286085Z #43 0.563 DEBUG Selecting: py-cpuinfo==9.0.0 [compatible] (py_cpuinfo-9.0.0-py3-none-any.whl) 2025-09-07T09:15:13.9286796Z #43 0.563 DEBUG Found stale response for: https://pypi.org/simple/idna/ 2025-09-07T09:15:13.9287426Z #43 0.563 DEBUG Sending revalidation request for: https://pypi.org/simple/idna/ 2025-09-07T09:15:13.9288060Z #43 0.563 DEBUG Searching for a compatible version of transformers (>=4.55.2) 2025-09-07T09:15:13.9288782Z #43 0.563 DEBUG Selecting: transformers==4.56.1 [compatible] (transformers-4.56.1-py3-none-any.whl) 2025-09-07T09:15:13.9289510Z #43 0.563 DEBUG Adding transitive dependency for transformers==4.56.1: filelock* 2025-09-07T09:15:13.9290258Z #43 0.563 DEBUG Adding transitive dependency for transformers==4.56.1: huggingface-hub>=0.34.0, <1.0 2025-09-07T09:15:13.9291015Z #43 0.563 DEBUG Adding transitive dependency for transformers==4.56.1: numpy>=1.17 2025-09-07T09:15:13.9291739Z #43 0.563 DEBUG Adding transitive dependency for transformers==4.56.1: packaging>=20.0 2025-09-07T09:15:13.9292624Z #43 0.563 DEBUG Found stale response for: https://pypi.org/simple/certifi/ 2025-09-07T09:15:13.9293714Z #43 0.563 DEBUG Sending revalidation request for: https://pypi.org/simple/certifi/ 2025-09-07T09:15:13.9294427Z #43 0.563 DEBUG Adding transitive dependency for transformers==4.56.1: pyyaml>=5.1 2025-09-07T09:15:13.9295208Z #43 0.563 DEBUG Adding transitive dependency for transformers==4.56.1: regex<2019.12.17 | >=2019.12.17+ 2025-09-07T09:15:13.9295969Z #43 0.563 DEBUG Adding transitive dependency for transformers==4.56.1: requests* 2025-09-07T09:15:13.9296809Z #43 0.563 DEBUG Adding transitive dependency for transformers==4.56.1: tokenizers>=0.22.0, <=0.23.0+ 2025-09-07T09:15:13.9297599Z #43 0.563 DEBUG Adding transitive dependency for transformers==4.56.1: safetensors>=0.4.3 2025-09-07T09:15:13.9298324Z #43 0.563 DEBUG Adding transitive dependency for transformers==4.56.1: tqdm>=4.27 2025-09-07T09:15:13.9298997Z #43 0.564 DEBUG Found stale response for: https://pypi.org/simple/urllib3/ 2025-09-07T09:15:13.9299657Z #43 0.564 DEBUG Sending revalidation request for: https://pypi.org/simple/urllib3/ 2025-09-07T09:15:13.9300374Z #43 0.564 DEBUG Searching for a compatible version of tokenizers (>=0.22.0, <=0.23.0+) 2025-09-07T09:15:13.9301289Z #43 0.564 DEBUG Selecting: tokenizers==0.22.0 [compatible] (tokenizers-0.22.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9302283Z #43 0.564 DEBUG Adding transitive dependency for tokenizers==0.22.0: huggingface-hub>=0.16.4, <1.0 2025-09-07T09:15:13.9303027Z #43 0.564 DEBUG Searching for a compatible version of protobuf (*) 2025-09-07T09:15:13.9303769Z #43 0.564 DEBUG Selecting: protobuf==6.32.0 [compatible] (protobuf-6.32.0-cp39-abi3-manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9304712Z #43 0.564 DEBUG Searching for a compatible version of fastapi[standard] (>=0.115.0) 2025-09-07T09:15:13.9305405Z #43 0.564 DEBUG Selecting: fastapi==0.116.1 [compatible] (fastapi-0.116.1-py3-none-any.whl) 2025-09-07T09:15:13.9306122Z #43 0.564 DEBUG Adding transitive dependency for fastapi==0.116.1: fastapi==0.116.1 2025-09-07T09:15:13.9306838Z #43 0.564 DEBUG Adding transitive dependency for fastapi==0.116.1: fastapi[standard]==0.116.1 2025-09-07T09:15:13.9307519Z #43 0.564 DEBUG Searching for a compatible version of fastapi (==0.116.1) 2025-09-07T09:15:13.9308172Z #43 0.564 DEBUG Selecting: fastapi==0.116.1 [compatible] (fastapi-0.116.1-py3-none-any.whl) 2025-09-07T09:15:13.9308839Z #43 0.564 DEBUG Found stale response for: https://pypi.org/simple/llvmlite/ 2025-09-07T09:15:13.9309509Z #43 0.564 DEBUG Sending revalidation request for: https://pypi.org/simple/llvmlite/ 2025-09-07T09:15:13.9310794Z #43 0.564 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/e5/47/d63c60f59a59467fda0f93f46335c9d18526d7071f025cb5b89d5353ea42/fastapi-0.116.1-py3-none-any.whl.metadata 2025-09-07T09:15:13.9312068Z #43 0.564 DEBUG Found stale response for: https://pypi.org/simple/huggingface-hub/ 2025-09-07T09:15:13.9312806Z #43 0.564 DEBUG Sending revalidation request for: https://pypi.org/simple/huggingface-hub/ 2025-09-07T09:15:13.9313615Z #43 0.564 DEBUG Adding transitive dependency for fastapi==0.116.1: starlette>=0.40.0, <0.48.0 2025-09-07T09:15:13.9314570Z #43 0.564 DEBUG Adding transitive dependency for fastapi==0.116.1: pydantic>=1.7.4, <1.8 | >=1.8+, <1.8.1 | >=1.8.1+, <2.0.0 | >=2.0.0+, <2.0.1 | >=2.0.1+, <2.1.0 | >=2.1.0+, <3.0.0 2025-09-07T09:15:13.9315501Z #43 0.564 DEBUG Adding transitive dependency for fastapi==0.116.1: typing-extensions>=4.8.0 2025-09-07T09:15:13.9316224Z #43 0.564 DEBUG Searching for a compatible version of fastapi[standard] (==0.116.1) 2025-09-07T09:15:13.9316923Z #43 0.564 DEBUG Selecting: fastapi==0.116.1 [compatible] (fastapi-0.116.1-py3-none-any.whl) 2025-09-07T09:15:13.9317672Z #43 0.564 DEBUG Adding transitive dependency for fastapi==0.116.1: fastapi-cli[standard]>=0.0.8 2025-09-07T09:15:13.9318396Z #43 0.564 DEBUG Adding transitive dependency for fastapi==0.116.1: httpx>=0.23.0 2025-09-07T09:15:13.9319077Z #43 0.564 DEBUG Adding transitive dependency for fastapi==0.116.1: jinja2>=3.1.5 2025-09-07T09:15:13.9319832Z #43 0.564 DEBUG Adding transitive dependency for fastapi==0.116.1: python-multipart>=0.0.18 2025-09-07T09:15:13.9320590Z #43 0.564 DEBUG Adding transitive dependency for fastapi==0.116.1: email-validator>=2.0.0 2025-09-07T09:15:13.9321590Z #43 0.564 DEBUG Adding transitive dependency for fastapi==0.116.1: uvicorn[standard]>=0.12.0 2025-09-07T09:15:13.9322241Z #43 0.564 DEBUG Searching for a compatible version of aiohttp (*) 2025-09-07T09:15:13.9323073Z #43 0.564 DEBUG Selecting: aiohttp==3.12.15 [compatible] (aiohttp-3.12.15-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9323989Z #43 0.564 DEBUG Adding transitive dependency for aiohttp==3.12.15: aiohappyeyeballs>=2.5.0 2025-09-07T09:15:13.9324702Z #43 0.564 DEBUG Adding transitive dependency for aiohttp==3.12.15: aiosignal>=1.4.0 2025-09-07T09:15:13.9325364Z #43 0.564 DEBUG Adding transitive dependency for aiohttp==3.12.15: attrs>=17.3.0 2025-09-07T09:15:13.9326045Z #43 0.564 DEBUG Adding transitive dependency for aiohttp==3.12.15: frozenlist>=1.1.1 2025-09-07T09:15:13.9326783Z #43 0.564 DEBUG Adding transitive dependency for aiohttp==3.12.15: multidict>=4.5, <7.0 2025-09-07T09:15:13.9327488Z #43 0.564 DEBUG Adding transitive dependency for aiohttp==3.12.15: propcache>=0.2.0 2025-09-07T09:15:13.9328161Z #43 0.564 DEBUG Adding transitive dependency for aiohttp==3.12.15: yarl>=1.17.0, <2.0 2025-09-07T09:15:13.9328897Z #43 0.564 DEBUG Found stale response for: https://pypi.org/simple/python-multipart/ 2025-09-07T09:15:13.9329643Z #43 0.564 DEBUG Sending revalidation request for: https://pypi.org/simple/python-multipart/ 2025-09-07T09:15:13.9330312Z #43 0.564 DEBUG Searching for a compatible version of openai (>=1.99.1) 2025-09-07T09:15:13.9330956Z #43 0.564 DEBUG Selecting: openai==1.106.1 [compatible] (openai-1.106.1-py3-none-any.whl) 2025-09-07T09:15:13.9331638Z #43 0.564 DEBUG Adding transitive dependency for openai==1.106.1: anyio>=3.5.0, <5 2025-09-07T09:15:13.9332274Z #43 0.564 DEBUG Found stale response for: https://pypi.org/simple/httpx/ 2025-09-07T09:15:13.9333009Z #43 0.564 DEBUG Adding transitive dependency for openai==1.106.1: distro>=1.7.0, <2 2025-09-07T09:15:13.9333870Z #43 0.564 DEBUG Sending revalidation request for: https://pypi.org/simple/httpx/ 2025-09-07T09:15:13.9334571Z #43 0.564 DEBUG Adding transitive dependency for openai==1.106.1: httpx>=0.23.0, <1 2025-09-07T09:15:13.9335360Z #43 0.564 DEBUG Adding transitive dependency for openai==1.106.1: jiter>=0.4.0, <1 2025-09-07T09:15:13.9336137Z #43 0.564 DEBUG Adding transitive dependency for openai==1.106.1: pydantic>=1.9.0, <3 2025-09-07T09:15:13.9336791Z #43 0.564 DEBUG Adding transitive dependency for openai==1.106.1: sniffio* 2025-09-07T09:15:13.9337403Z #43 0.564 DEBUG Adding transitive dependency for openai==1.106.1: tqdm>4 2025-09-07T09:15:13.9338111Z #43 0.564 DEBUG Adding transitive dependency for openai==1.106.1: typing-extensions>=4.11, <5 2025-09-07T09:15:13.9338827Z #43 0.564 DEBUG Searching for a compatible version of pydantic (>=2.11.7, <3.0.0) 2025-09-07T09:15:13.9339594Z #43 0.564 DEBUG Selecting: pydantic==2.11.7 [compatible] (pydantic-2.11.7-py3-none-any.whl) 2025-09-07T09:15:13.9340343Z #43 0.564 DEBUG Adding transitive dependency for pydantic==2.11.7: annotated-types>=0.6.0 2025-09-07T09:15:13.9341140Z #43 0.564 DEBUG Adding transitive dependency for pydantic==2.11.7: pydantic-core>=2.33.2, <2.33.2+ 2025-09-07T09:15:13.9341953Z #43 0.564 DEBUG Adding transitive dependency for pydantic==2.11.7: typing-extensions>=4.12.2 2025-09-07T09:15:13.9342724Z #43 0.564 DEBUG Adding transitive dependency for pydantic==2.11.7: typing-inspection>=0.4.0 2025-09-07T09:15:13.9343428Z #43 0.565 DEBUG Found stale response for: https://pypi.org/simple/jinja2/ 2025-09-07T09:15:13.9344093Z #43 0.565 DEBUG Sending revalidation request for: https://pypi.org/simple/jinja2/ 2025-09-07T09:15:13.9344966Z #43 0.565 DEBUG Found stale response for: https://pypi.org/simple/fastapi-cli/ 2025-09-07T09:15:13.9345660Z #43 0.565 DEBUG Sending revalidation request for: https://pypi.org/simple/fastapi-cli/ 2025-09-07T09:15:13.9346362Z #43 0.565 DEBUG Found stale response for: https://pypi.org/simple/email-validator/ 2025-09-07T09:15:13.9347088Z #43 0.565 DEBUG Sending revalidation request for: https://pypi.org/simple/email-validator/ 2025-09-07T09:15:13.9347767Z #43 0.565 DEBUG Found stale response for: https://pypi.org/simple/uvicorn/ 2025-09-07T09:15:13.9348456Z #43 0.565 DEBUG Sending revalidation request for: https://pypi.org/simple/uvicorn/ 2025-09-07T09:15:13.9349110Z #43 0.565 DEBUG Found stale response for: https://pypi.org/simple/starlette/ 2025-09-07T09:15:13.9349788Z #43 0.565 DEBUG Sending revalidation request for: https://pypi.org/simple/starlette/ 2025-09-07T09:15:13.9350497Z #43 0.565 DEBUG Found stale response for: https://pypi.org/simple/aiohappyeyeballs/ 2025-09-07T09:15:13.9351226Z #43 0.565 DEBUG Sending revalidation request for: https://pypi.org/simple/aiohappyeyeballs/ 2025-09-07T09:15:13.9351938Z #43 0.565 DEBUG Found stale response for: https://pypi.org/simple/aiosignal/ 2025-09-07T09:15:13.9352601Z #43 0.565 DEBUG Sending revalidation request for: https://pypi.org/simple/aiosignal/ 2025-09-07T09:15:13.9353622Z #43 0.565 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.11, <5 2025-09-07T09:15:13.9354632Z #43 0.565 DEBUG Found stale response for: https://pypi.org/simple/charset-normalizer/ 2025-09-07T09:15:13.9355413Z #43 0.565 DEBUG Sending revalidation request for: https://pypi.org/simple/charset-normalizer/ 2025-09-07T09:15:13.9356117Z #43 0.565 DEBUG Found stale response for: https://pypi.org/simple/attrs/ 2025-09-07T09:15:13.9356743Z #43 0.565 DEBUG Sending revalidation request for: https://pypi.org/simple/attrs/ 2025-09-07T09:15:13.9357403Z #43 0.565 DEBUG Found stale response for: https://pypi.org/simple/frozenlist/ 2025-09-07T09:15:13.9358076Z #43 0.565 DEBUG Sending revalidation request for: https://pypi.org/simple/frozenlist/ 2025-09-07T09:15:13.9358763Z #43 0.565 DEBUG Found stale response for: https://pypi.org/simple/propcache/ 2025-09-07T09:15:13.9359444Z #43 0.565 DEBUG Sending revalidation request for: https://pypi.org/simple/propcache/ 2025-09-07T09:15:13.9360459Z #43 0.565 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.12.2, <5 2025-09-07T09:15:13.9361421Z #43 0.565 DEBUG Found stale response for: https://pypi.org/simple/anyio/ 2025-09-07T09:15:13.9362044Z #43 0.565 DEBUG Sending revalidation request for: https://pypi.org/simple/anyio/ 2025-09-07T09:15:13.9362681Z #43 0.565 DEBUG Found stale response for: https://pypi.org/simple/distro/ 2025-09-07T09:15:13.9363329Z #43 0.565 DEBUG Sending revalidation request for: https://pypi.org/simple/distro/ 2025-09-07T09:15:13.9363958Z #43 0.565 DEBUG Found stale response for: https://pypi.org/simple/jiter/ 2025-09-07T09:15:13.9364591Z #43 0.565 DEBUG Sending revalidation request for: https://pypi.org/simple/jiter/ 2025-09-07T09:15:13.9365260Z #43 0.565 DEBUG Found stale response for: https://pypi.org/simple/sniffio/ 2025-09-07T09:15:13.9365918Z #43 0.565 DEBUG Sending revalidation request for: https://pypi.org/simple/sniffio/ 2025-09-07T09:15:13.9366602Z #43 0.565 DEBUG Found stale response for: https://pypi.org/simple/annotated-types/ 2025-09-07T09:15:13.9367334Z #43 0.565 DEBUG Sending revalidation request for: https://pypi.org/simple/annotated-types/ 2025-09-07T09:15:13.9368078Z #43 0.565 DEBUG Found stale response for: https://pypi.org/simple/typing-inspection/ 2025-09-07T09:15:13.9368815Z #43 0.565 DEBUG Sending revalidation request for: https://pypi.org/simple/typing-inspection/ 2025-09-07T09:15:13.9369534Z #43 0.566 DEBUG Found stale response for: https://pypi.org/simple/safetensors/ 2025-09-07T09:15:13.9370210Z #43 0.566 DEBUG Sending revalidation request for: https://pypi.org/simple/safetensors/ 2025-09-07T09:15:13.9370947Z #43 0.568 DEBUG Found stale response for: https://pypi.org/simple/multidict/ 2025-09-07T09:15:13.9371633Z #43 0.568 DEBUG Sending revalidation request for: https://pypi.org/simple/multidict/ 2025-09-07T09:15:13.9372341Z #43 0.569 DEBUG Found not-modified response for: https://pypi.org/simple/interegular/ 2025-09-07T09:15:13.9373307Z #43 0.569 DEBUG Found not-modified response for: https://pypi.org/simple/packaging/ 2025-09-07T09:15:13.9374038Z #43 0.578 DEBUG Found not-modified response for: https://pypi.org/simple/torch/ 2025-09-07T09:15:13.9374764Z #43 0.578 DEBUG Found not-modified response for: https://pypi.org/simple/triton/ 2025-09-07T09:15:13.9375432Z #43 0.578 DEBUG Found not-modified response for: https://pypi.org/simple/astor/ 2025-09-07T09:15:13.9376106Z #43 0.578 DEBUG Found not-modified response for: https://pypi.org/simple/dill/ 2025-09-07T09:15:13.9376809Z #43 0.578 DEBUG Found not-modified response for: https://pypi.org/simple/frozendict/ 2025-09-07T09:15:13.9377501Z #43 0.579 DEBUG Found not-modified response for: https://pypi.org/simple/idna/ 2025-09-07T09:15:13.9378180Z #43 0.579 DEBUG Found not-modified response for: https://pypi.org/simple/urllib3/ 2025-09-07T09:15:13.9378864Z #43 0.579 DEBUG Found not-modified response for: https://pypi.org/simple/certifi/ 2025-09-07T09:15:13.9379567Z #43 0.579 DEBUG Found not-modified response for: https://pypi.org/simple/llvmlite/ 2025-09-07T09:15:13.9380317Z #43 0.579 DEBUG Found not-modified response for: https://pypi.org/simple/huggingface-hub/ 2025-09-07T09:15:13.9381069Z #43 0.579 DEBUG Found not-modified response for: https://pypi.org/simple/jinja2/ 2025-09-07T09:15:13.9381761Z #43 0.579 DEBUG Found not-modified response for: https://pypi.org/simple/httpx/ 2025-09-07T09:15:13.9382483Z #43 0.579 DEBUG Found not-modified response for: https://pypi.org/simple/python-multipart/ 2025-09-07T09:15:13.9383263Z #43 0.579 DEBUG Found not-modified response for: https://pypi.org/simple/email-validator/ 2025-09-07T09:15:13.9384013Z #43 0.579 DEBUG Found not-modified response for: https://pypi.org/simple/fastapi-cli/ 2025-09-07T09:15:13.9384751Z #43 0.579 DEBUG Found not-modified response for: https://pypi.org/simple/starlette/ 2025-09-07T09:15:13.9385549Z #43 0.580 DEBUG Found not-modified response for: https://pypi.org/simple/uvicorn/ 2025-09-07T09:15:13.9386213Z #43 0.580 DEBUG Found not-modified response for: https://pypi.org/simple/attrs/ 2025-09-07T09:15:13.9386892Z #43 0.580 DEBUG Found not-modified response for: https://pypi.org/simple/aiosignal/ 2025-09-07T09:15:13.9387623Z #43 0.580 DEBUG Found not-modified response for: https://pypi.org/simple/charset-normalizer/ 2025-09-07T09:15:13.9388394Z #43 0.581 DEBUG Found not-modified response for: https://pypi.org/simple/aiohappyeyeballs/ 2025-09-07T09:15:13.9389119Z #43 0.581 DEBUG Found not-modified response for: https://pypi.org/simple/frozenlist/ 2025-09-07T09:15:13.9389828Z #43 0.581 DEBUG Found not-modified response for: https://pypi.org/simple/propcache/ 2025-09-07T09:15:13.9390514Z #43 0.581 DEBUG Found not-modified response for: https://pypi.org/simple/distro/ 2025-09-07T09:15:13.9391176Z #43 0.581 DEBUG Found not-modified response for: https://pypi.org/simple/anyio/ 2025-09-07T09:15:13.9391855Z #43 0.581 DEBUG Found not-modified response for: https://pypi.org/simple/jiter/ 2025-09-07T09:15:13.9392744Z #43 0.582 DEBUG Found not-modified response for: https://pypi.org/simple/annotated-types/ 2025-09-07T09:15:13.9393720Z #43 0.582 DEBUG Found not-modified response for: https://pypi.org/simple/typing-inspection/ 2025-09-07T09:15:13.9394461Z #43 0.582 DEBUG Found not-modified response for: https://pypi.org/simple/sniffio/ 2025-09-07T09:15:13.9395185Z #43 0.582 DEBUG Found not-modified response for: https://pypi.org/simple/safetensors/ 2025-09-07T09:15:13.9395845Z #43 0.583 DEBUG Found stale response for: https://pypi.org/simple/yarl/ 2025-09-07T09:15:13.9396492Z #43 0.583 DEBUG Sending revalidation request for: https://pypi.org/simple/yarl/ 2025-09-07T09:15:13.9397261Z #43 0.583 DEBUG Found stale response for: https://pypi.org/simple/pydantic-core/ 2025-09-07T09:15:13.9398000Z #43 0.583 DEBUG Sending revalidation request for: https://pypi.org/simple/pydantic-core/ 2025-09-07T09:15:13.9398712Z #43 0.584 DEBUG Found installed version of packaging==25.0 that satisfies * 2025-09-07T09:15:13.9399352Z #43 0.584 DEBUG Found installed version of packaging==25.0 that satisfies >=20.0 2025-09-07T09:15:13.9400052Z #43 0.584 DEBUG Found not-modified response for: https://pypi.org/simple/multidict/ 2025-09-07T09:15:13.9401443Z #43 0.590 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/c4/01/72d6472f80651673716d1deda2a5bbb633e563ecf94f4479da5519d69d25/interegular-0.3.3-py37-none-any.whl.metadata 2025-09-07T09:15:13.9403250Z #43 0.590 DEBUG Found installed version of torch==2.9.0.dev20250906+cu128 (from file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) that satisfies >=1.10.0 2025-09-07T09:15:13.9404698Z #43 0.591 DEBUG Found installed version of jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) that satisfies >=3.1.5 2025-09-07T09:15:13.9406150Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/c3/88/97eef84f48fa04fbd6750e62dcceafba6c63c81b7ac1420856c8dcc0a3f9/astor-0.8.1-py2.py3-none-any.whl.metadata 2025-09-07T09:15:13.9407997Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/50/3d/9373ad9c56321fdab5b41197068e1d8c25883b3fea29dd361f9b55116869/dill-0.4.0-py3-none-any.whl.metadata 2025-09-07T09:15:13.9409907Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/ba/d0/d482c39cee2ab2978a892558cf130681d4574ea208e162da8958b31e9250/frozendict-2.4.6-py312-none-any.whl.metadata 2025-09-07T09:15:13.9412012Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/a7/c2/fe1e52489ae3122415c51f387e221dd0773709bad6c6cdaa599e8a2c5185/urllib3-2.5.0-py3-none-any.whl.metadata 2025-09-07T09:15:13.9413978Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/76/c6/c88e154df9c4e1a2a66ccf0005a88dfb2650c1dffb6f5ce603dfbd452ce3/idna-3.10-py3-none-any.whl.metadata 2025-09-07T09:15:13.9415896Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/e5/48/1549795ba7742c948d2ad169c1c8cdbae65bc450d6cd753d124b17c8cd32/certifi-2025.8.3-py3-none-any.whl.metadata 2025-09-07T09:15:13.9418014Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/cb/da/8341fd3056419441286c8e26bf436923021005ece0bff5f41906476ae514/llvmlite-0.44.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9420185Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/39/7b/bb06b061991107cd8783f300adff3e7b7f284e330fd82f507f2a1417b11d/huggingface_hub-0.34.4-py3-none-any.whl.metadata 2025-09-07T09:15:13.9422130Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/2a/39/e50c7c3a983047577ee07d2a9e53faf5a69493943ec3f6a384bdc792deb2/httpx-0.28.1-py3-none-any.whl.metadata 2025-09-07T09:15:13.9424071Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/de/15/545e2b6cf2e3be84bc1ed85613edd75b8aea69807a71c26f4ca6a9258e82/email_validator-2.3.0-py3-none-any.whl.metadata 2025-09-07T09:15:13.9426143Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/45/58/38b5afbc1a800eeea951b9285d3912613f2603bdf897a4ab0f4bd7f405fc/python_multipart-0.0.20-py3-none-any.whl.metadata 2025-09-07T09:15:13.9428141Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/ce/fd/901cfa59aaa5b30a99e16876f11abe38b59a1a2c51ffb3d7142bb6089069/starlette-0.47.3-py3-none-any.whl.metadata 2025-09-07T09:15:13.9430049Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/77/06/bb80f5f86020c4551da315d78b3ab75e8228f89f0162f2c3a819e407941a/attrs-25.3.0-py3-none-any.whl.metadata 2025-09-07T09:15:13.9431994Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/fb/76/641ae371508676492379f16e2fa48f4e2c11741bd63c48be4b12a6b09cba/aiosignal-1.4.0-py3-none-any.whl.metadata 2025-09-07T09:15:13.9434284Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/16/ab/0233c3231af734f5dfcf0844aa9582d5a1466c985bbed6cedab85af9bfe3/charset_normalizer-3.4.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T09:15:13.9436647Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/0f/15/5bf3b99495fb160b63f95972b81750f18f7f4e02ad051373b669d17d44f2/aiohappyeyeballs-2.6.1-py3-none-any.whl.metadata 2025-09-07T09:15:13.9438995Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/8d/db/48421f62a6f77c553575201e89048e97198046b793f4a089c79a6e3268bd/frozenlist-1.7.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9441488Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/37/7c/54fd5301ef38505ab235d98827207176a5c9b2aa61939b10a460ca53e123/propcache-0.3.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9443602Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/12/b3/231ffd4ab1fc9d679809f356cebee130ac7daa00d6d6f3206dd4fd137e9e/distro-1.9.0-py3-none-any.whl.metadata 2025-09-07T09:15:13.9445552Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/6f/12/e5e0282d673bb9746bacfb6e2dba8719989d3660cdb2ea79aee9a9651afb/anyio-4.10.0-py3-none-any.whl.metadata 2025-09-07T09:15:13.9447508Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/78/b6/6307fbef88d9b5ee7421e68d78a9f162e0da4900bc5f5793f6d3d0e34fb8/annotated_types-0.7.0-py3-none-any.whl.metadata 2025-09-07T09:15:13.9449648Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/e2/ba/77013b0b8ba904bf3762f11e0129b8928bff7f978a81838dfcc958ad5728/jiter-0.10.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9451795Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/17/69/cd203477f944c353c31bade965f880aa1061fd6bf05ded0726ca845b6ff7/typing_inspection-0.4.1-py3-none-any.whl.metadata 2025-09-07T09:15:13.9453855Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/e9/44/75a9c9421471a6c4805dbf2356f7c181a29c1879239abab1ea2cc8f38b40/sniffio-1.3.1-py3-none-any.whl.metadata 2025-09-07T09:15:13.9455964Z #43 0.592 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/fe/5d/5a514d7b88e310c8b146e2404e0dc161282e78634d9358975fd56dfd14be/safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9458403Z #43 0.593 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/af/65/753a2d8b05daf496f4a9c367fe844e90a1b2cac78e2be2c844200d10cc4c/multidict-6.6.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T09:15:13.9460246Z #43 0.593 DEBUG Found not-modified response for: https://pypi.org/simple/yarl/ 2025-09-07T09:15:13.9461255Z #43 0.594 DEBUG Found not-modified response for: https://pypi.org/simple/pydantic-core/ 2025-09-07T09:15:13.9463354Z #43 0.602 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/98/28/3ab7acc5b51f4434b181b0cee8f1f4b77a65919700a355fb3617f9488874/yarl-1.20.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9465693Z #43 0.605 DEBUG Searching for a compatible version of pydantic-core (>=2.33.2, <2.33.2+) 2025-09-07T09:15:13.9467004Z #43 0.605 DEBUG Selecting: pydantic-core==2.33.2 [compatible] (pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9469448Z #43 0.606 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/f9/41/4b043778cf9c4285d59742281a769eac371b9e47e35f98ad321349cc5d61/pydantic_core-2.33.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9471805Z #43 0.606 DEBUG Adding transitive dependency for pydantic-core==2.33.2: typing-extensions>=4.6.0, <4.7.0 | >=4.7.0+ 2025-09-07T09:15:13.9472934Z #43 0.606 DEBUG Searching for a compatible version of prometheus-client (>=0.18.0) 2025-09-07T09:15:13.9474041Z #43 0.606 DEBUG Selecting: prometheus-client==0.22.1 [compatible] (prometheus_client-0.22.1-py3-none-any.whl) 2025-09-07T09:15:13.9475039Z #43 0.606 DEBUG Searching for a compatible version of pillow (*) 2025-09-07T09:15:13.9476418Z #43 0.606 DEBUG Found installed version of pillow==11.3.0 (from file:///dist/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies * 2025-09-07T09:15:13.9477541Z #43 0.606 DEBUG Selecting: pillow==11.3.0 [installed] (installed) 2025-09-07T09:15:13.9478222Z #43 0.606 DEBUG Searching for a compatible version of prometheus-fastapi-instrumentator (>=7.0.0) 2025-09-07T09:15:13.9479399Z #43 0.606 DEBUG Selecting: prometheus-fastapi-instrumentator==7.1.0 [compatible] (prometheus_fastapi_instrumentator-7.1.0-py3-none-any.whl) 2025-09-07T09:15:13.9481108Z #43 0.606 DEBUG Adding transitive dependency for prometheus-fastapi-instrumentator==7.1.0: prometheus-client>=0.8.0, <1.0.0 2025-09-07T09:15:13.9482302Z #43 0.606 DEBUG Adding transitive dependency for prometheus-fastapi-instrumentator==7.1.0: starlette>=0.30.0, <1.0.0 2025-09-07T09:15:13.9483429Z #43 0.606 DEBUG Searching for a compatible version of tiktoken (>=0.6.0) 2025-09-07T09:15:13.9484624Z #43 0.606 DEBUG Selecting: tiktoken==0.11.0 [compatible] (tiktoken-0.11.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9485866Z #43 0.606 DEBUG Adding transitive dependency for tiktoken==0.11.0: regex>=2022.1.18 2025-09-07T09:15:13.9486811Z #43 0.606 DEBUG Adding transitive dependency for tiktoken==0.11.0: requests>=2.26.0 2025-09-07T09:15:13.9488379Z #43 0.606 DEBUG Searching for a compatible version of llguidance{platform_machine == 'aarch64' or platform_machine == 'arm64' or platform_machine == 'x86_64'} (>=0.7.11, <0.8.0) 2025-09-07T09:15:13.9489694Z #43 0.606 DEBUG Selecting: llguidance==0.7.30 [compatible] (llguidance-0.7.30-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9490628Z #43 0.606 DEBUG Adding transitive dependency for llguidance==0.7.30: llguidance==0.7.30 2025-09-07T09:15:13.9491769Z #43 0.606 DEBUG Adding transitive dependency for llguidance==0.7.30: llguidance{platform_machine == 'aarch64' or platform_machine == 'arm64' or platform_machine == 'x86_64'}==0.7.30 2025-09-07T09:15:13.9493086Z #43 0.606 DEBUG Searching for a compatible version of llguidance (==0.7.30) 2025-09-07T09:15:13.9493977Z #43 0.606 DEBUG Selecting: llguidance==0.7.30 [compatible] (llguidance-0.7.30-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9496353Z #43 0.606 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/af/80/5a40b9689f17612434b820854cba9b8cabd5142072c491b5280fe5f7a35e/llguidance-0.7.30-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9499049Z #43 0.606 DEBUG Searching for a compatible version of llguidance{platform_machine == 'aarch64' or platform_machine == 'arm64' or platform_machine == 'x86_64'} (==0.7.30) 2025-09-07T09:15:13.9500811Z #43 0.606 DEBUG Selecting: llguidance==0.7.30 [compatible] (llguidance-0.7.30-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9502081Z #43 0.606 DEBUG Searching for a compatible version of typing-extensions (>=4.12.2, <5) 2025-09-07T09:15:13.9503513Z #43 0.606 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.12.2, <5 2025-09-07T09:15:13.9504875Z #43 0.606 DEBUG Selecting: typing-extensions==4.14.1 [installed] (installed) 2025-09-07T09:15:13.9505706Z #43 0.606 DEBUG Searching for a compatible version of filelock (>=3.16.1) 2025-09-07T09:15:13.9506959Z #43 0.606 DEBUG Found installed version of filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) that satisfies >=3.16.1 2025-09-07T09:15:13.9508108Z #43 0.606 DEBUG Selecting: filelock==3.19.1 [installed] (installed) 2025-09-07T09:15:13.9508905Z #43 0.606 DEBUG Searching for a compatible version of partial-json-parser (*) 2025-09-07T09:15:13.9510110Z #43 0.606 DEBUG Selecting: partial-json-parser==0.2.1.1.post6 [compatible] (partial_json_parser-0.2.1.1.post6-py3-none-any.whl) 2025-09-07T09:15:13.9511010Z #43 0.606 DEBUG Searching for a compatible version of pyzmq (>=25.0.0) 2025-09-07T09:15:13.9512140Z #43 0.606 DEBUG Selecting: pyzmq==27.0.2 [compatible] (pyzmq-27.0.2-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:15:13.9513237Z #43 0.606 DEBUG Searching for a compatible version of msgspec (*) 2025-09-07T09:15:13.9514468Z #43 0.606 DEBUG Selecting: msgspec==0.19.0 [compatible] (msgspec-0.19.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9515606Z #43 0.606 DEBUG Searching for a compatible version of gguf (>=0.13.0) 2025-09-07T09:15:13.9516440Z #43 0.606 DEBUG Selecting: gguf==0.17.1 [compatible] (gguf-0.17.1-py3-none-any.whl) 2025-09-07T09:15:13.9517331Z #43 0.606 DEBUG Adding transitive dependency for gguf==0.17.1: numpy>=1.17 2025-09-07T09:15:13.9517994Z #43 0.606 DEBUG Adding transitive dependency for gguf==0.17.1: pyyaml>=5.1 2025-09-07T09:15:13.9518590Z #43 0.606 DEBUG Adding transitive dependency for gguf==0.17.1: tqdm>=4.27 2025-09-07T09:15:13.9519308Z #43 0.606 DEBUG Searching for a compatible version of mistral-common[audio] (>=1.8.2) 2025-09-07T09:15:13.9520184Z #43 0.606 DEBUG Selecting: mistral-common==1.8.4 [compatible] (mistral_common-1.8.4-py3-none-any.whl) 2025-09-07T09:15:13.9521148Z #43 0.606 DEBUG Adding transitive dependency for mistral-common==1.8.4: mistral-common==1.8.4 2025-09-07T09:15:13.9522292Z #43 0.606 DEBUG Adding transitive dependency for mistral-common==1.8.4: mistral-common[audio]==1.8.4 2025-09-07T09:15:13.9523405Z #43 0.606 DEBUG Searching for a compatible version of mistral-common (==1.8.4) 2025-09-07T09:15:13.9524363Z #43 0.606 DEBUG Selecting: mistral-common==1.8.4 [compatible] (mistral_common-1.8.4-py3-none-any.whl) 2025-09-07T09:15:13.9526506Z #43 0.606 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/d6/4f/756a66c608a767c7af7010b23992343e97558ce7f86c5c15929f1215f6ef/mistral_common-1.8.4-py3-none-any.whl.metadata 2025-09-07T09:15:13.9528098Z #43 0.607 DEBUG Adding transitive dependency for mistral-common==1.8.4: pydantic>=2.7, <3.0 2025-09-07T09:15:13.9529249Z #43 0.607 DEBUG Adding transitive dependency for mistral-common==1.8.4: jsonschema>=4.21.1 2025-09-07T09:15:13.9530297Z #43 0.607 DEBUG Adding transitive dependency for mistral-common==1.8.4: typing-extensions>=4.11.0 2025-09-07T09:15:13.9531410Z #43 0.607 DEBUG Adding transitive dependency for mistral-common==1.8.4: tiktoken>=0.7.0 2025-09-07T09:15:13.9532206Z #43 0.607 DEBUG Adding transitive dependency for mistral-common==1.8.4: pillow>=10.3.0 2025-09-07T09:15:13.9533045Z #43 0.607 DEBUG Adding transitive dependency for mistral-common==1.8.4: requests>=2.0.0 2025-09-07T09:15:13.9533883Z #43 0.607 DEBUG Adding transitive dependency for mistral-common==1.8.4: numpy>=1.25 2025-09-07T09:15:13.9534740Z #43 0.607 DEBUG Adding transitive dependency for mistral-common==1.8.4: pydantic-extra-types[pycountry]>=2.10.5 2025-09-07T09:15:13.9535592Z #43 0.607 DEBUG Searching for a compatible version of mistral-common[audio] (==1.8.4) 2025-09-07T09:15:13.9536376Z #43 0.607 DEBUG Selecting: mistral-common==1.8.4 [compatible] (mistral_common-1.8.4-py3-none-any.whl) 2025-09-07T09:15:13.9537188Z #43 0.607 DEBUG Adding transitive dependency for mistral-common==1.8.4: soundfile>=0.12.1 2025-09-07T09:15:13.9537914Z #43 0.607 DEBUG Adding transitive dependency for mistral-common==1.8.4: soxr>=0.5.0 2025-09-07T09:15:13.9538626Z #43 0.607 DEBUG Searching for a compatible version of mistral-common[image] (>=1.8.2) 2025-09-07T09:15:13.9539987Z #43 0.607 DEBUG Selecting: mistral-common==1.8.4 [compatible] (mistral_common-1.8.4-py3-none-any.whl) 2025-09-07T09:15:13.9541279Z #43 0.607 DEBUG Adding transitive dependency for mistral-common==1.8.4: mistral-common==1.8.4 2025-09-07T09:15:13.9542137Z #43 0.607 DEBUG Adding transitive dependency for mistral-common==1.8.4: mistral-common[image]==1.8.4 2025-09-07T09:15:13.9542917Z #43 0.607 DEBUG Searching for a compatible version of mistral-common[image] (==1.8.4) 2025-09-07T09:15:13.9543704Z #43 0.607 DEBUG Selecting: mistral-common==1.8.4 [compatible] (mistral_common-1.8.4-py3-none-any.whl) 2025-09-07T09:15:13.9545119Z #43 0.607 DEBUG Adding transitive dependency for mistral-common==1.8.4: opencv-python-headless>=4.0.0 2025-09-07T09:15:13.9546208Z #43 0.607 DEBUG Searching for a compatible version of opencv-python-headless (>=4.11.0) 2025-09-07T09:15:13.9547312Z #43 0.607 DEBUG Selecting: opencv-python-headless==4.12.0.88 [compatible] (opencv_python_headless-4.12.0.88-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:13.9548586Z #43 0.607 DEBUG Adding transitive dependency for opencv-python-headless==4.12.0.88: numpy{python_full_version >= '3.9'}>=2, <2.3.0 2025-09-07T09:15:13.9550116Z #43 0.607 DEBUG Found stale response for: https://pypi.org/simple/pydantic-extra-types/ 2025-09-07T09:15:13.9551514Z #43 0.607 DEBUG Searching for a compatible version of numpy{python_full_version >= '3.9'} (>=2, <2.3.0) 2025-09-07T09:15:13.9552833Z #43 0.607 DEBUG Sending revalidation request for: https://pypi.org/simple/pydantic-extra-types/ 2025-09-07T09:15:13.9553877Z #43 0.607 DEBUG Selecting: numpy==2.2.6 [compatible] (numpy-2.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9554711Z #43 0.607 DEBUG Adding transitive dependency for numpy==2.2.6: numpy==2.2.6 2025-09-07T09:15:13.9555466Z #43 0.607 DEBUG Adding transitive dependency for numpy==2.2.6: numpy{python_full_version >= '3.9'}==2.2.6 2025-09-07T09:15:13.9556374Z #43 0.607 DEBUG Searching for a compatible version of numpy{python_full_version >= '3.9'} (==2.2.6) 2025-09-07T09:15:13.9557285Z #43 0.607 DEBUG Selecting: numpy==2.2.6 [compatible] (numpy-2.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9558131Z #43 0.607 DEBUG Found stale response for: https://pypi.org/simple/soundfile/ 2025-09-07T09:15:13.9558923Z #43 0.607 DEBUG Sending revalidation request for: https://pypi.org/simple/soundfile/ 2025-09-07T09:15:13.9560065Z #43 0.607 DEBUG Searching for a compatible version of pyyaml (>=5.1) 2025-09-07T09:15:13.9561398Z #43 0.607 DEBUG Selecting: pyyaml==6.0.2 [compatible] (PyYAML-6.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9562351Z #43 0.607 DEBUG Searching for a compatible version of six{python_full_version >= '3.12'} (>=1.16.0) 2025-09-07T09:15:13.9563110Z #43 0.607 DEBUG Selecting: six==1.17.0 [compatible] (six-1.17.0-py2.py3-none-any.whl) 2025-09-07T09:15:13.9563751Z #43 0.607 DEBUG Adding transitive dependency for six==1.17.0: six==1.17.0 2025-09-07T09:15:13.9564881Z #43 0.607 DEBUG Adding transitive dependency for six==1.17.0: six{python_full_version >= '3.12'}==1.17.0 2025-09-07T09:15:13.9565665Z #43 0.607 DEBUG Searching for a compatible version of six (==1.17.0) 2025-09-07T09:15:13.9566293Z #43 0.607 DEBUG Selecting: six==1.17.0 [compatible] (six-1.17.0-py2.py3-none-any.whl) 2025-09-07T09:15:13.9566974Z #43 0.607 DEBUG Found stale response for: https://pypi.org/simple/jsonschema/ 2025-09-07T09:15:13.9567675Z #43 0.607 DEBUG Sending revalidation request for: https://pypi.org/simple/jsonschema/ 2025-09-07T09:15:13.9569036Z #43 0.607 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/b7/ce/149a00dd41f10bc29e5921b496af8b574d8413afcd5e30dfa0ed46c2cc5e/six-1.17.0-py2.py3-none-any.whl.metadata 2025-09-07T09:15:13.9570399Z #43 0.607 DEBUG Searching for a compatible version of six{python_full_version >= '3.12'} (==1.17.0) 2025-09-07T09:15:13.9571151Z #43 0.607 DEBUG Selecting: six==1.17.0 [compatible] (six-1.17.0-py2.py3-none-any.whl) 2025-09-07T09:15:13.9572360Z #43 0.607 DEBUG Searching for a compatible version of setuptools{python_full_version >= '3.12'} (>=77.0.3, <80) 2025-09-07T09:15:13.9573666Z #43 0.607 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies >=77.0.3, <80 2025-09-07T09:15:13.9574568Z #43 0.607 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T09:15:13.9575211Z #43 0.607 DEBUG Adding transitive dependency for setuptools==78.1.0: setuptools==78.1.0 2025-09-07T09:15:13.9576143Z #43 0.607 DEBUG Adding transitive dependency for setuptools==78.1.0: setuptools{python_full_version >= '3.12'}==78.1.0 2025-09-07T09:15:13.9576965Z #43 0.607 DEBUG Searching for a compatible version of setuptools (==78.1.0) 2025-09-07T09:15:13.9577866Z #43 0.607 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T09:15:13.9578763Z #43 0.607 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T09:15:13.9579632Z #43 0.607 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T09:15:13.9580743Z #43 0.607 DEBUG Searching for a compatible version of setuptools{python_full_version >= '3.12'} (==78.1.0) 2025-09-07T09:15:13.9581784Z #43 0.607 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T09:15:13.9582689Z #43 0.607 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T09:15:13.9583246Z #43 0.607 DEBUG Searching for a compatible version of einops (*) 2025-09-07T09:15:13.9584123Z #43 0.607 DEBUG Selecting: einops==0.8.1 [compatible] (einops-0.8.1-py3-none-any.whl) 2025-09-07T09:15:13.9585179Z #43 0.607 DEBUG Searching for a compatible version of cloudpickle (*) 2025-09-07T09:15:13.9585965Z #43 0.607 DEBUG Selecting: cloudpickle==3.1.1 [compatible] (cloudpickle-3.1.1-py3-none-any.whl) 2025-09-07T09:15:13.9586662Z #43 0.607 DEBUG Searching for a compatible version of watchfiles (*) 2025-09-07T09:15:13.9587536Z #43 0.607 DEBUG Selecting: watchfiles==1.1.0 [compatible] (watchfiles-1.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9588435Z #43 0.607 DEBUG Adding transitive dependency for watchfiles==1.1.0: anyio>=3.0.0 2025-09-07T09:15:13.9589540Z #43 0.607 DEBUG Searching for a compatible version of python-json-logger (*) 2025-09-07T09:15:13.9590330Z #43 0.607 DEBUG Selecting: python-json-logger==3.3.0 [compatible] (python_json_logger-3.3.0-py3-none-any.whl) 2025-09-07T09:15:13.9591070Z #43 0.607 DEBUG Searching for a compatible version of scipy (*) 2025-09-07T09:15:13.9591859Z #43 0.607 DEBUG Selecting: scipy==1.16.1 [compatible] (scipy-1.16.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:13.9592893Z #43 0.607 DEBUG Found stale response for: https://pypi.org/simple/soxr/ 2025-09-07T09:15:13.9593636Z #43 0.607 DEBUG Sending revalidation request for: https://pypi.org/simple/soxr/ 2025-09-07T09:15:13.9594314Z #43 0.607 DEBUG Adding transitive dependency for scipy==1.16.1: numpy>=1.25.2, <2.6 2025-09-07T09:15:13.9595033Z #43 0.607 DEBUG Searching for a compatible version of ninja (*) 2025-09-07T09:15:13.9595799Z #43 0.607 DEBUG Selecting: ninja==1.13.0 [compatible] (ninja-1.13.0-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:13.9596575Z #43 0.607 DEBUG Searching for a compatible version of pybase64 (*) 2025-09-07T09:15:13.9598067Z #43 0.607 DEBUG Selecting: pybase64==1.4.2 [compatible] (pybase64-1.4.2-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl) 2025-09-07T09:15:13.9599059Z #43 0.608 DEBUG Searching for a compatible version of cbor2 (*) 2025-09-07T09:15:13.9599937Z #43 0.608 DEBUG Selecting: cbor2==5.7.0 [compatible] (cbor2-5.7.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:15:13.9601345Z #43 0.608 DEBUG Searching for a compatible version of setproctitle (*) 2025-09-07T09:15:13.9603092Z #43 0.608 DEBUG Selecting: setproctitle==1.3.7 [compatible] (setproctitle-1.3.7-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl) 2025-09-07T09:15:13.9604133Z #43 0.608 DEBUG Searching for a compatible version of openai-harmony (>=0.0.3) 2025-09-07T09:15:13.9605065Z #43 0.608 DEBUG Selecting: openai-harmony==0.0.4 [compatible] (openai_harmony-0.0.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9606131Z #43 0.608 DEBUG Adding transitive dependency for openai-harmony==0.0.4: pydantic>=2.11.7 2025-09-07T09:15:13.9606881Z #43 0.608 DEBUG Searching for a compatible version of ray[cgraph] (>=2.48.0) 2025-09-07T09:15:13.9607589Z #43 0.608 DEBUG Selecting: ray==2.49.1 [compatible] (ray-2.49.1-cp312-cp312-manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9608345Z #43 0.608 DEBUG Adding transitive dependency for ray==2.49.1: ray==2.49.1 2025-09-07T09:15:13.9608980Z #43 0.608 DEBUG Adding transitive dependency for ray==2.49.1: ray[cgraph]==2.49.1 2025-09-07T09:15:13.9609902Z #43 0.608 DEBUG Searching for a compatible version of ray (==2.49.1) 2025-09-07T09:15:13.9611102Z #43 0.608 DEBUG Selecting: ray==2.49.1 [compatible] (ray-2.49.1-cp312-cp312-manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9613019Z #43 0.608 DEBUG No cache entry for: https://files.pythonhosted.org/packages/00/02/c81260c0f94bd34a1442ea488bdd433dfc9e6ed6211c9a59bc4157b8e00e/ray-2.49.1-cp312-cp312-manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9614517Z #43 0.608 DEBUG Found not-modified response for: https://pypi.org/simple/soundfile/ 2025-09-07T09:15:13.9615250Z #43 0.608 DEBUG Found not-modified response for: https://pypi.org/simple/jsonschema/ 2025-09-07T09:15:13.9617125Z #43 0.609 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/57/5e/70bdd9579b35003a489fc850b5047beeda26328053ebadc1fb60f320f7db/soundfile-0.13.1-py2.py3-none-manylinux_2_28_x86_64.whl.metadata 2025-09-07T09:15:13.9619931Z #43 0.609 DEBUG Found not-modified response for: https://pypi.org/simple/pydantic-extra-types/ 2025-09-07T09:15:13.9621390Z #43 0.609 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/bf/9c/8c95d856233c1f82500c2450b8c68576b4cf1c871db3afac5c34ff84e6fd/jsonschema-4.25.1-py3-none-any.whl.metadata 2025-09-07T09:15:13.9622709Z #43 0.609 DEBUG Found not-modified response for: https://pypi.org/simple/soxr/ 2025-09-07T09:15:13.9624249Z #43 0.610 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/e1/1a/569ea0420a0c4801c2c8dd40d8d544989522f6014d51def689125f3f2935/soxr-0.5.0.post1-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9625747Z #43 0.616 DEBUG Adding transitive dependency for ray==2.49.1: click>=7.0 2025-09-07T09:15:13.9626657Z #43 0.616 DEBUG Adding transitive dependency for ray==2.49.1: filelock* 2025-09-07T09:15:13.9627545Z #43 0.616 DEBUG Adding transitive dependency for ray==2.49.1: jsonschema* 2025-09-07T09:15:13.9628211Z #43 0.616 DEBUG Adding transitive dependency for ray==2.49.1: msgpack>=1.0.0, <2.0.0 2025-09-07T09:15:13.9628873Z #43 0.616 DEBUG Adding transitive dependency for ray==2.49.1: packaging* 2025-09-07T09:15:13.9629576Z #43 0.616 DEBUG Adding transitive dependency for ray==2.49.1: protobuf>=3.20.3 2025-09-07T09:15:13.9630193Z #43 0.616 DEBUG Adding transitive dependency for ray==2.49.1: pyyaml* 2025-09-07T09:15:13.9630811Z #43 0.616 DEBUG Adding transitive dependency for ray==2.49.1: requests* 2025-09-07T09:15:13.9631432Z #43 0.616 DEBUG Searching for a compatible version of ray[cgraph] (==2.49.1) 2025-09-07T09:15:13.9632142Z #43 0.616 DEBUG Selecting: ray==2.49.1 [compatible] (ray-2.49.1-cp312-cp312-manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9632977Z #43 0.616 DEBUG Adding transitive dependency for ray==2.49.1: cupy-cuda12x{sys_platform != 'darwin'}* 2025-09-07T09:15:13.9633717Z #43 0.616 DEBUG Searching for a compatible version of interegular (>=0.3.2) 2025-09-07T09:15:13.9634529Z #43 0.616 DEBUG Selecting: interegular==0.3.3 [compatible] (interegular-0.3.3-py37-none-any.whl) 2025-09-07T09:15:13.9635249Z #43 0.616 DEBUG Searching for a compatible version of packaging (>=20.0) 2025-09-07T09:15:13.9635882Z #43 0.616 DEBUG Found installed version of packaging==25.0 that satisfies >=20.0 2025-09-07T09:15:13.9636504Z #43 0.616 DEBUG Selecting: packaging==25.0 [installed] (installed) 2025-09-07T09:15:13.9637061Z #43 0.616 DEBUG No cache entry for: https://pypi.org/simple/msgpack/ 2025-09-07T09:15:13.9637645Z #43 0.616 DEBUG Searching for a compatible version of torch (>=1.10.0) 2025-09-07T09:15:13.9638787Z #43 0.616 DEBUG Found installed version of torch==2.9.0.dev20250906+cu128 (from file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) that satisfies >=1.10.0 2025-09-07T09:15:13.9639899Z #43 0.616 DEBUG Selecting: torch==2.9.0.dev20250906+cu128 [installed] (installed) 2025-09-07T09:15:13.9640542Z #43 0.616 DEBUG No cache entry for: https://pypi.org/simple/cupy-cuda12x/ 2025-09-07T09:15:13.9641217Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: filelock* 2025-09-07T09:15:13.9642052Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: typing-extensions>=4.10.0 2025-09-07T09:15:13.9643379Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: setuptools{python_full_version >= '3.12'}* 2025-09-07T09:15:13.9644626Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: sympy>=1.13.3 2025-09-07T09:15:13.9645493Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: networkx>=2.5.1 2025-09-07T09:15:13.9646245Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: jinja2* 2025-09-07T09:15:13.9647002Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: fsspec>=0.8.5 2025-09-07T09:15:13.9648172Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.93, <12.8.93+ 2025-09-07T09:15:13.9649698Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.90, <12.8.90+ 2025-09-07T09:15:13.9651232Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.90, <12.8.90+ 2025-09-07T09:15:13.9653720Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=9.10.2.21, <9.10.2.21+ 2025-09-07T09:15:13.9654780Z #43 0.616 DEBUG Found stale response for: https://pypi.org/simple/click/ 2025-09-07T09:15:13.9655849Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.4.1, <12.8.4.1+ 2025-09-07T09:15:13.9657105Z #43 0.616 DEBUG Sending revalidation request for: https://pypi.org/simple/click/ 2025-09-07T09:15:13.9659130Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=11.3.3.83, <11.3.3.83+ 2025-09-07T09:15:13.9660632Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=10.3.9.90, <10.3.9.90+ 2025-09-07T09:15:13.9663080Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=11.7.3.90, <11.7.3.90+ 2025-09-07T09:15:13.9665068Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.5.8.93, <12.5.8.93+ 2025-09-07T09:15:13.9666652Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=0.7.1, <0.7.1+ 2025-09-07T09:15:13.9668113Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=2.27.5, <2.27.5+ 2025-09-07T09:15:13.9669568Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=3.3.20, <3.3.20+ 2025-09-07T09:15:13.9671058Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.90, <12.8.90+ 2025-09-07T09:15:13.9672527Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.93, <12.8.93+ 2025-09-07T09:15:13.9674027Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=1.13.1.3, <1.13.1.3+ 2025-09-07T09:15:13.9675494Z #43 0.616 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: pytorch-triton{platform_machine == 'x86_64' and sys_platform == 'linux'}==3.4.0+gitf7888497 2025-09-07T09:15:13.9676515Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/sympy/ 2025-09-07T09:15:13.9677181Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/sympy/ 2025-09-07T09:15:13.9677876Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/networkx/ 2025-09-07T09:15:13.9678562Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/networkx/ 2025-09-07T09:15:13.9679229Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/fsspec/ 2025-09-07T09:15:13.9679897Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/fsspec/ 2025-09-07T09:15:13.9680651Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T09:15:13.9681465Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T09:15:13.9682299Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T09:15:13.9683122Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T09:15:13.9683952Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T09:15:13.9684773Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T09:15:13.9685558Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T09:15:13.9686321Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T09:15:13.9687084Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T09:15:13.9687866Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T09:15:13.9688672Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T09:15:13.9689428Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T09:15:13.9690199Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T09:15:13.9690965Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T09:15:13.9691754Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T09:15:13.9692860Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T09:15:13.9693666Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T09:15:13.9694552Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T09:15:13.9695344Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T09:15:13.9696166Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T09:15:13.9696942Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T09:15:13.9697706Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T09:15:13.9698536Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T09:15:13.9699316Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T09:15:13.9700086Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T09:15:13.9700833Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T09:15:13.9701614Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T09:15:13.9702422Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T09:15:13.9703196Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T09:15:13.9703981Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T09:15:13.9704884Z #43 0.617 DEBUG Found stale response for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T09:15:13.9705611Z #43 0.617 DEBUG Sending revalidation request for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T09:15:13.9706315Z #43 0.618 DEBUG Found not-modified response for: https://pypi.org/simple/click/ 2025-09-07T09:15:13.9706977Z #43 0.619 DEBUG Found not-modified response for: https://pypi.org/simple/sympy/ 2025-09-07T09:15:13.9707841Z #43 0.620 DEBUG Found installed version of sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) that satisfies >=1.13.3 2025-09-07T09:15:13.9708732Z #43 0.625 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T09:15:13.9709468Z #43 0.625 DEBUG Found not-modified response for: https://pypi.org/simple/networkx/ 2025-09-07T09:15:13.9710217Z #43 0.625 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T09:15:13.9711046Z #43 0.625 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T09:15:13.9711794Z #43 0.625 DEBUG Found not-modified response for: https://pypi.org/simple/fsspec/ 2025-09-07T09:15:13.9712503Z #43 0.625 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T09:15:13.9713300Z #43 0.625 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T09:15:13.9714088Z #43 0.625 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T09:15:13.9714875Z #43 0.625 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T09:15:13.9715710Z #43 0.625 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T09:15:13.9716469Z #43 0.625 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T09:15:13.9717255Z #43 0.625 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T09:15:13.9718025Z #43 0.626 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T09:15:13.9718806Z #43 0.626 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T09:15:13.9719614Z #43 0.626 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T09:15:13.9720400Z #43 0.626 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T09:15:13.9721218Z #43 0.626 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T09:15:13.9721975Z #43 0.626 DEBUG Found not-modified response for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T09:15:13.9722995Z #43 0.626 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.93, <12.8.93+) 2025-09-07T09:15:13.9724581Z #43 0.626 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies >=12.8.93, <12.8.93+ 2025-09-07T09:15:13.9725798Z #43 0.626 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.8.93 [installed] (installed) 2025-09-07T09:15:13.9726622Z #43 0.626 DEBUG Adding transitive dependency for nvidia-cuda-nvrtc-cu12==12.8.93: nvidia-cuda-nvrtc-cu12==12.8.93 2025-09-07T09:15:13.9727815Z #43 0.626 DEBUG Adding transitive dependency for nvidia-cuda-nvrtc-cu12==12.8.93: nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.93 2025-09-07T09:15:13.9728898Z #43 0.626 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12 (==12.8.93) 2025-09-07T09:15:13.9730116Z #43 0.626 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T09:15:13.9731285Z #43 0.626 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.8.93 [installed] (installed) 2025-09-07T09:15:13.9732192Z #43 0.627 DEBUG Found installed version of networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) that satisfies >=2.5.1 2025-09-07T09:15:13.9733541Z #43 0.627 DEBUG Found installed version of fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) that satisfies >=0.8.5 2025-09-07T09:15:13.9734967Z #43 0.627 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T09:15:13.9736474Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.93) 2025-09-07T09:15:13.9737958Z #43 0.628 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T09:15:13.9739178Z #43 0.628 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.8.93 [installed] (installed) 2025-09-07T09:15:13.9740199Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.90, <12.8.90+) 2025-09-07T09:15:13.9741782Z #43 0.628 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.8.90, <12.8.90+ 2025-09-07T09:15:13.9743071Z #43 0.628 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.8.90 [installed] (installed) 2025-09-07T09:15:13.9744019Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cuda-runtime-cu12==12.8.90: nvidia-cuda-runtime-cu12==12.8.90 2025-09-07T09:15:13.9745411Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cuda-runtime-cu12==12.8.90: nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.90 2025-09-07T09:15:13.9746523Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12 (==12.8.90) 2025-09-07T09:15:13.9747754Z #43 0.628 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:15:13.9748960Z #43 0.628 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.8.90 [installed] (installed) 2025-09-07T09:15:13.9750205Z #43 0.628 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:15:13.9751676Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.90) 2025-09-07T09:15:13.9753194Z #43 0.628 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:15:13.9754400Z #43 0.628 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.8.90 [installed] (installed) 2025-09-07T09:15:13.9755444Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.90, <12.8.90+) 2025-09-07T09:15:13.9756963Z #43 0.628 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.8.90, <12.8.90+ 2025-09-07T09:15:13.9758187Z #43 0.628 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.8.90 [installed] (installed) 2025-09-07T09:15:13.9758996Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cuda-cupti-cu12==12.8.90: nvidia-cuda-cupti-cu12==12.8.90 2025-09-07T09:15:13.9760611Z #43 0.628 DEBUG No cache entry for: https://files.pythonhosted.org/packages/4d/ec/fd869e2567cc9c01278a736cfd1697941ba0d4b81a43e0aa2e8d71dab208/msgpack-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:13.9762401Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cuda-cupti-cu12==12.8.90: nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.90 2025-09-07T09:15:13.9763481Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12 (==12.8.90) 2025-09-07T09:15:13.9764693Z #43 0.628 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:15:13.9765859Z #43 0.628 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.8.90 [installed] (installed) 2025-09-07T09:15:13.9767046Z #43 0.628 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:15:13.9768497Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.90) 2025-09-07T09:15:13.9769930Z #43 0.628 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:15:13.9771105Z #43 0.628 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.8.90 [installed] (installed) 2025-09-07T09:15:13.9772064Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=9.10.2.21, <9.10.2.21+) 2025-09-07T09:15:13.9773735Z #43 0.628 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=9.10.2.21, <9.10.2.21+ 2025-09-07T09:15:13.9774858Z #43 0.628 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T09:15:13.9775646Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cudnn-cu12==9.10.2.21 2025-09-07T09:15:13.9776805Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==9.10.2.21 2025-09-07T09:15:13.9777885Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cudnn-cu12 (==9.10.2.21) 2025-09-07T09:15:13.9779078Z #43 0.628 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T09:15:13.9780160Z #43 0.628 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T09:15:13.9781217Z #43 0.628 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T09:15:13.9782419Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cublas-cu12* 2025-09-07T09:15:13.9783452Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==9.10.2.21) 2025-09-07T09:15:13.9784850Z #43 0.628 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies * 2025-09-07T09:15:13.9786253Z #43 0.628 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T09:15:13.9787292Z #43 0.628 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T09:15:13.9788019Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cublas-cu12* 2025-09-07T09:15:13.9789095Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.4.1, <12.8.4.1+) 2025-09-07T09:15:13.9790466Z #43 0.628 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=12.8.4.1, <12.8.4.1+ 2025-09-07T09:15:13.9791545Z #43 0.628 DEBUG Selecting: nvidia-cublas-cu12==12.8.4.1 [installed] (installed) 2025-09-07T09:15:13.9792661Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cublas-cu12==12.8.4.1: nvidia-cublas-cu12==12.8.4.1 2025-09-07T09:15:13.9793814Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cublas-cu12==12.8.4.1: nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.4.1 2025-09-07T09:15:13.9794878Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cublas-cu12 (==12.8.4.1) 2025-09-07T09:15:13.9795996Z #43 0.628 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.8.4.1 2025-09-07T09:15:13.9797466Z #43 0.628 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.8.4.1 2025-09-07T09:15:13.9798539Z #43 0.628 DEBUG Selecting: nvidia-cublas-cu12==12.8.4.1 [installed] (installed) 2025-09-07T09:15:13.9799468Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.4.1) 2025-09-07T09:15:13.9800820Z #43 0.628 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.8.4.1 2025-09-07T09:15:13.9801972Z #43 0.628 DEBUG Selecting: nvidia-cublas-cu12==12.8.4.1 [installed] (installed) 2025-09-07T09:15:13.9802938Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=11.3.3.83, <11.3.3.83+) 2025-09-07T09:15:13.9804581Z #43 0.628 DEBUG Found installed version of nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=11.3.3.83, <11.3.3.83+ 2025-09-07T09:15:13.9805751Z #43 0.628 DEBUG Selecting: nvidia-cufft-cu12==11.3.3.83 [installed] (installed) 2025-09-07T09:15:13.9806511Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.3.3.83: nvidia-cufft-cu12==11.3.3.83 2025-09-07T09:15:13.9807683Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.3.3.83: nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==11.3.3.83 2025-09-07T09:15:13.9808686Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cufft-cu12 (==11.3.3.83) 2025-09-07T09:15:13.9809843Z #43 0.628 DEBUG Found installed version of nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.3.3.83 2025-09-07T09:15:13.9811500Z #43 0.628 DEBUG Found installed version of nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.3.3.83 2025-09-07T09:15:13.9812614Z #43 0.628 DEBUG Selecting: nvidia-cufft-cu12==11.3.3.83 [installed] (installed) 2025-09-07T09:15:13.9813617Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.3.3.83: nvidia-nvjitlink-cu12* 2025-09-07T09:15:13.9814666Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==11.3.3.83) 2025-09-07T09:15:13.9816100Z #43 0.628 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies * 2025-09-07T09:15:13.9818023Z #43 0.628 DEBUG Found installed version of nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.3.3.83 2025-09-07T09:15:13.9819180Z #43 0.628 DEBUG Selecting: nvidia-cufft-cu12==11.3.3.83 [installed] (installed) 2025-09-07T09:15:13.9819965Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.3.3.83: nvidia-nvjitlink-cu12* 2025-09-07T09:15:13.9821063Z #43 0.628 DEBUG Searching for a compatible version of nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=10.3.9.90, <10.3.9.90+) 2025-09-07T09:15:13.9822509Z #43 0.628 DEBUG Found installed version of nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=10.3.9.90, <10.3.9.90+ 2025-09-07T09:15:13.9823646Z #43 0.628 DEBUG Selecting: nvidia-curand-cu12==10.3.9.90 [installed] (installed) 2025-09-07T09:15:13.9824458Z #43 0.628 DEBUG Adding transitive dependency for nvidia-curand-cu12==10.3.9.90: nvidia-curand-cu12==10.3.9.90 2025-09-07T09:15:13.9825722Z #43 0.628 DEBUG Adding transitive dependency for nvidia-curand-cu12==10.3.9.90: nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==10.3.9.90 2025-09-07T09:15:13.9826753Z #43 0.628 DEBUG Searching for a compatible version of nvidia-curand-cu12 (==10.3.9.90) 2025-09-07T09:15:13.9827829Z #43 0.628 DEBUG Found installed version of nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.9.90 2025-09-07T09:15:13.9829292Z #43 0.628 DEBUG Found installed version of nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.9.90 2025-09-07T09:15:13.9830380Z #43 0.628 DEBUG Selecting: nvidia-curand-cu12==10.3.9.90 [installed] (installed) 2025-09-07T09:15:13.9831281Z #43 0.628 DEBUG Searching for a compatible version of nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==10.3.9.90) 2025-09-07T09:15:13.9832598Z #43 0.628 DEBUG Found installed version of nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.9.90 2025-09-07T09:15:13.9833888Z #43 0.628 DEBUG Selecting: nvidia-curand-cu12==10.3.9.90 [installed] (installed) 2025-09-07T09:15:13.9834943Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=11.7.3.90, <11.7.3.90+) 2025-09-07T09:15:13.9836480Z #43 0.628 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=11.7.3.90, <11.7.3.90+ 2025-09-07T09:15:13.9837600Z #43 0.628 DEBUG Selecting: nvidia-cusolver-cu12==11.7.3.90 [installed] (installed) 2025-09-07T09:15:13.9838425Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cusolver-cu12==11.7.3.90 2025-09-07T09:15:13.9839662Z #43 0.628 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==11.7.3.90 2025-09-07T09:15:13.9840715Z #43 0.628 DEBUG Searching for a compatible version of nvidia-cusolver-cu12 (==11.7.3.90) 2025-09-07T09:15:13.9841842Z #43 0.628 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.3.90 2025-09-07T09:15:13.9842934Z #43 0.628 DEBUG Selecting: nvidia-cusolver-cu12==11.7.3.90 [installed] (installed) 2025-09-07T09:15:13.9844020Z #43 0.628 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.3.90 2025-09-07T09:15:13.9845204Z #43 0.629 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cublas-cu12* 2025-09-07T09:15:13.9846101Z #43 0.629 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-nvjitlink-cu12* 2025-09-07T09:15:13.9846990Z #43 0.629 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cusparse-cu12* 2025-09-07T09:15:13.9848033Z #43 0.629 DEBUG Searching for a compatible version of nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==11.7.3.90) 2025-09-07T09:15:13.9849432Z #43 0.629 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies * 2025-09-07T09:15:13.9851016Z #43 0.629 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.3.90 2025-09-07T09:15:13.9852108Z #43 0.629 DEBUG Selecting: nvidia-cusolver-cu12==11.7.3.90 [installed] (installed) 2025-09-07T09:15:13.9852966Z #43 0.629 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cublas-cu12* 2025-09-07T09:15:13.9854045Z #43 0.629 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-nvjitlink-cu12* 2025-09-07T09:15:13.9854939Z #43 0.629 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cusparse-cu12* 2025-09-07T09:15:13.9856063Z #43 0.629 DEBUG Searching for a compatible version of nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.5.8.93, <12.5.8.93+) 2025-09-07T09:15:13.9857651Z #43 0.629 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.5.8.93, <12.5.8.93+ 2025-09-07T09:15:13.9858939Z #43 0.629 DEBUG Selecting: nvidia-cusparse-cu12==12.5.8.93 [installed] (installed) 2025-09-07T09:15:13.9859791Z #43 0.629 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.8.93: nvidia-cusparse-cu12==12.5.8.93 2025-09-07T09:15:13.9861021Z #43 0.629 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.8.93: nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.5.8.93 2025-09-07T09:15:13.9862125Z #43 0.629 DEBUG Searching for a compatible version of nvidia-cusparse-cu12 (==12.5.8.93) 2025-09-07T09:15:13.9863379Z #43 0.629 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.8.93 2025-09-07T09:15:13.9865252Z #43 0.629 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.8.93 2025-09-07T09:15:13.9866437Z #43 0.629 DEBUG Selecting: nvidia-cusparse-cu12==12.5.8.93 [installed] (installed) 2025-09-07T09:15:13.9867221Z #43 0.629 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.8.93: nvidia-nvjitlink-cu12* 2025-09-07T09:15:13.9868284Z #43 0.629 DEBUG Searching for a compatible version of nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.5.8.93) 2025-09-07T09:15:13.9869740Z #43 0.629 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.8.93 2025-09-07T09:15:13.9870909Z #43 0.629 DEBUG Selecting: nvidia-cusparse-cu12==12.5.8.93 [installed] (installed) 2025-09-07T09:15:13.9871690Z #43 0.629 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.8.93: nvidia-nvjitlink-cu12* 2025-09-07T09:15:13.9872768Z #43 0.629 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=0.7.1, <0.7.1+) 2025-09-07T09:15:13.9874171Z #43 0.629 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies >=0.7.1, <0.7.1+ 2025-09-07T09:15:13.9875316Z #43 0.629 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T09:15:13.9876123Z #43 0.629 DEBUG Adding transitive dependency for nvidia-cusparselt-cu12==0.7.1: nvidia-cusparselt-cu12==0.7.1 2025-09-07T09:15:13.9877303Z #43 0.629 DEBUG Adding transitive dependency for nvidia-cusparselt-cu12==0.7.1: nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==0.7.1 2025-09-07T09:15:13.9878368Z #43 0.629 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12 (==0.7.1) 2025-09-07T09:15:13.9879569Z #43 0.629 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T09:15:13.9881052Z #43 0.629 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T09:15:13.9882119Z #43 0.629 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T09:15:13.9883045Z #43 0.629 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==0.7.1) 2025-09-07T09:15:13.9884383Z #43 0.629 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T09:15:13.9885448Z #43 0.629 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T09:15:13.9886359Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=2.27.5, <2.27.5+) 2025-09-07T09:15:13.9887818Z #43 0.629 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=2.27.5, <2.27.5+ 2025-09-07T09:15:13.9888928Z #43 0.629 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T09:15:13.9889649Z #43 0.629 DEBUG Adding transitive dependency for nvidia-nccl-cu12==2.27.5: nvidia-nccl-cu12==2.27.5 2025-09-07T09:15:13.9890708Z #43 0.629 DEBUG Adding transitive dependency for nvidia-nccl-cu12==2.27.5: nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==2.27.5 2025-09-07T09:15:13.9891725Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nccl-cu12 (==2.27.5) 2025-09-07T09:15:13.9893275Z #43 0.629 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T09:15:13.9894871Z #43 0.629 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T09:15:13.9895991Z #43 0.629 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T09:15:13.9896975Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==2.27.5) 2025-09-07T09:15:13.9898356Z #43 0.629 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T09:15:13.9899475Z #43 0.629 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T09:15:13.9900431Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=3.3.20, <3.3.20+) 2025-09-07T09:15:13.9901930Z #43 0.629 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=3.3.20, <3.3.20+ 2025-09-07T09:15:13.9903135Z #43 0.629 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T09:15:13.9903963Z #43 0.629 DEBUG Adding transitive dependency for nvidia-nvshmem-cu12==3.3.20: nvidia-nvshmem-cu12==3.3.20 2025-09-07T09:15:13.9905234Z #43 0.629 DEBUG Adding transitive dependency for nvidia-nvshmem-cu12==3.3.20: nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==3.3.20 2025-09-07T09:15:13.9906264Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12 (==3.3.20) 2025-09-07T09:15:13.9907419Z #43 0.629 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T09:15:13.9908550Z #43 0.629 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T09:15:13.9909685Z #43 0.629 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T09:15:13.9911069Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==3.3.20) 2025-09-07T09:15:13.9912467Z #43 0.629 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T09:15:13.9913580Z #43 0.629 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T09:15:13.9914499Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.90, <12.8.90+) 2025-09-07T09:15:13.9915967Z #43 0.629 DEBUG Found installed version of nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.8.90, <12.8.90+ 2025-09-07T09:15:13.9917081Z #43 0.629 DEBUG Selecting: nvidia-nvtx-cu12==12.8.90 [installed] (installed) 2025-09-07T09:15:13.9917815Z #43 0.629 DEBUG Adding transitive dependency for nvidia-nvtx-cu12==12.8.90: nvidia-nvtx-cu12==12.8.90 2025-09-07T09:15:13.9918889Z #43 0.629 DEBUG Adding transitive dependency for nvidia-nvtx-cu12==12.8.90: nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.90 2025-09-07T09:15:13.9919858Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nvtx-cu12 (==12.8.90) 2025-09-07T09:15:13.9921014Z #43 0.629 DEBUG Found installed version of nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:15:13.9922093Z #43 0.629 DEBUG Selecting: nvidia-nvtx-cu12==12.8.90 [installed] (installed) 2025-09-07T09:15:13.9923177Z #43 0.629 DEBUG Found installed version of nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:15:13.9924563Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.90) 2025-09-07T09:15:13.9925906Z #43 0.629 DEBUG Found installed version of nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:15:13.9926991Z #43 0.629 DEBUG Selecting: nvidia-nvtx-cu12==12.8.90 [installed] (installed) 2025-09-07T09:15:13.9927933Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.93, <12.8.93+) 2025-09-07T09:15:13.9929432Z #43 0.629 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies >=12.8.93, <12.8.93+ 2025-09-07T09:15:13.9930640Z #43 0.629 DEBUG Selecting: nvidia-nvjitlink-cu12==12.8.93 [installed] (installed) 2025-09-07T09:15:13.9931465Z #43 0.629 DEBUG Adding transitive dependency for nvidia-nvjitlink-cu12==12.8.93: nvidia-nvjitlink-cu12==12.8.93 2025-09-07T09:15:13.9932661Z #43 0.629 DEBUG Adding transitive dependency for nvidia-nvjitlink-cu12==12.8.93: nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.93 2025-09-07T09:15:13.9934005Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12 (==12.8.93) 2025-09-07T09:15:13.9935234Z #43 0.629 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T09:15:13.9936451Z #43 0.629 DEBUG Selecting: nvidia-nvjitlink-cu12==12.8.93 [installed] (installed) 2025-09-07T09:15:13.9937417Z #43 0.629 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.93) 2025-09-07T09:15:13.9938901Z #43 0.629 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T09:15:13.9940105Z #43 0.629 DEBUG Selecting: nvidia-nvjitlink-cu12==12.8.93 [installed] (installed) 2025-09-07T09:15:13.9941073Z #43 0.629 DEBUG Searching for a compatible version of nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=1.13.1.3, <1.13.1.3+) 2025-09-07T09:15:13.9942597Z #43 0.629 DEBUG Found installed version of nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=1.13.1.3, <1.13.1.3+ 2025-09-07T09:15:13.9943865Z #43 0.629 DEBUG Selecting: nvidia-cufile-cu12==1.13.1.3 [installed] (installed) 2025-09-07T09:15:13.9944650Z #43 0.629 DEBUG Adding transitive dependency for nvidia-cufile-cu12==1.13.1.3: nvidia-cufile-cu12==1.13.1.3 2025-09-07T09:15:13.9945905Z #43 0.629 DEBUG Adding transitive dependency for nvidia-cufile-cu12==1.13.1.3: nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==1.13.1.3 2025-09-07T09:15:13.9946930Z #43 0.629 DEBUG Searching for a compatible version of nvidia-cufile-cu12 (==1.13.1.3) 2025-09-07T09:15:13.9948085Z #43 0.629 DEBUG Found installed version of nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.13.1.3 2025-09-07T09:15:13.9949253Z #43 0.629 DEBUG Selecting: nvidia-cufile-cu12==1.13.1.3 [installed] (installed) 2025-09-07T09:15:13.9950405Z #43 0.629 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T09:15:13.9952064Z #43 0.629 DEBUG Found installed version of nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.13.1.3 2025-09-07T09:15:13.9953502Z #43 0.629 DEBUG Searching for a compatible version of nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==1.13.1.3) 2025-09-07T09:15:13.9954886Z #43 0.629 DEBUG Found installed version of nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.13.1.3 2025-09-07T09:15:13.9956025Z #43 0.629 DEBUG Selecting: nvidia-cufile-cu12==1.13.1.3 [installed] (installed) 2025-09-07T09:15:13.9956951Z #43 0.629 DEBUG Searching for a compatible version of pytorch-triton{platform_machine == 'x86_64' and sys_platform == 'linux'} (==3.4.0+gitf7888497) 2025-09-07T09:15:13.9958493Z #43 0.629 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T09:15:13.9959824Z #43 0.629 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T09:15:13.9960666Z #43 0.629 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: pytorch-triton==3.4.0+gitf7888497 2025-09-07T09:15:13.9961880Z #43 0.629 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: pytorch-triton{platform_machine == 'x86_64' and sys_platform == 'linux'}==3.4.0+gitf7888497 2025-09-07T09:15:13.9962982Z #43 0.629 DEBUG Searching for a compatible version of pytorch-triton (==3.4.0+gitf7888497) 2025-09-07T09:15:13.9964291Z #43 0.629 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T09:15:13.9965580Z #43 0.629 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T09:15:13.9966876Z #43 0.629 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T09:15:13.9968239Z #43 0.629 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: setuptools>=40.8.0 2025-09-07T09:15:13.9969267Z #43 0.629 DEBUG Searching for a compatible version of pytorch-triton{platform_machine == 'x86_64' and sys_platform == 'linux'} (==3.4.0+gitf7888497) 2025-09-07T09:15:13.9970815Z #43 0.629 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T09:15:13.9972121Z #43 0.629 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T09:15:13.9972972Z #43 0.629 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: setuptools>=40.8.0 2025-09-07T09:15:13.9974075Z #43 0.629 DEBUG Searching for a compatible version of triton{platform_machine == 'x86_64' and sys_platform == 'linux'} (*) 2025-09-07T09:15:13.9975124Z #43 0.629 DEBUG Selecting: triton==3.4.0 [compatible] (triton-3.4.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:15:13.9975980Z #43 0.629 DEBUG Adding transitive dependency for triton==3.4.0: triton==3.4.0 2025-09-07T09:15:13.9976860Z #43 0.629 DEBUG Adding transitive dependency for triton==3.4.0: triton{platform_machine == 'x86_64' and sys_platform == 'linux'}==3.4.0 2025-09-07T09:15:13.9977762Z #43 0.629 DEBUG Searching for a compatible version of triton (==3.4.0) 2025-09-07T09:15:13.9978568Z #43 0.629 DEBUG Selecting: triton==3.4.0 [compatible] (triton-3.4.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:15:13.9980257Z #43 0.630 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/d0/66/b1eb52839f563623d185f0927eb3530ee4d5ffe9d377cdaf5346b306689e/triton-3.4.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T09:15:13.9981811Z #43 0.630 DEBUG Adding transitive dependency for triton==3.4.0: setuptools>=40.8.0 2025-09-07T09:15:13.9982685Z #43 0.630 DEBUG Searching for a compatible version of triton{platform_machine == 'x86_64' and sys_platform == 'linux'} (==3.4.0) 2025-09-07T09:15:13.9983740Z #43 0.630 DEBUG Selecting: triton==3.4.0 [compatible] (triton-3.4.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:15:13.9984615Z #43 0.630 DEBUG Adding transitive dependency for triton==3.4.0: setuptools>=40.8.0 2025-09-07T09:15:13.9985345Z #43 0.630 DEBUG Searching for a compatible version of frozendict (*) 2025-09-07T09:15:13.9986010Z #43 0.630 DEBUG Selecting: frozendict==2.4.6 [compatible] (frozendict-2.4.6-py312-none-any.whl) 2025-09-07T09:15:13.9986650Z #43 0.630 DEBUG Searching for a compatible version of astor (*) 2025-09-07T09:15:13.9987255Z #43 0.630 DEBUG Selecting: astor==0.8.1 [compatible] (astor-0.8.1-py2.py3-none-any.whl) 2025-09-07T09:15:13.9987846Z #43 0.630 DEBUG Searching for a compatible version of dill (*) 2025-09-07T09:15:13.9988446Z #43 0.630 DEBUG Selecting: dill==0.4.0 [compatible] (dill-0.4.0-py3-none-any.whl) 2025-09-07T09:15:13.9989100Z #43 0.630 DEBUG Searching for a compatible version of llvmlite (>=0.44.0.dev0, <0.45) 2025-09-07T09:15:13.9989966Z #43 0.630 DEBUG Selecting: llvmlite==0.44.0 [compatible] (llvmlite-0.44.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:13.9990844Z #43 0.630 DEBUG Searching for a compatible version of charset-normalizer (>=2, <4) 2025-09-07T09:15:13.9992100Z #43 0.630 DEBUG Selecting: charset-normalizer==3.4.3 [compatible] (charset_normalizer-3.4.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:15:13.9993332Z #43 0.630 DEBUG Searching for a compatible version of idna (>=2.5, <4) 2025-09-07T09:15:13.9993949Z #43 0.630 DEBUG Selecting: idna==3.10 [compatible] (idna-3.10-py3-none-any.whl) 2025-09-07T09:15:13.9994577Z #43 0.630 DEBUG Searching for a compatible version of urllib3 (>=1.21.1, <3) 2025-09-07T09:15:13.9995249Z #43 0.630 DEBUG Selecting: urllib3==2.5.0 [compatible] (urllib3-2.5.0-py3-none-any.whl) 2025-09-07T09:15:13.9995905Z #43 0.630 DEBUG Searching for a compatible version of certifi (>=2017.4.17) 2025-09-07T09:15:13.9996601Z #43 0.630 DEBUG Selecting: certifi==2025.8.3 [compatible] (certifi-2025.8.3-py3-none-any.whl) 2025-09-07T09:15:13.9997337Z #43 0.630 DEBUG Searching for a compatible version of huggingface-hub (>=0.34.0, <1.0) 2025-09-07T09:15:13.9998126Z #43 0.630 DEBUG Selecting: huggingface-hub==0.34.4 [compatible] (huggingface_hub-0.34.4-py3-none-any.whl) 2025-09-07T09:15:13.9998999Z #43 0.630 DEBUG Adding transitive dependency for huggingface-hub==0.34.4: filelock* 2025-09-07T09:15:13.9999726Z #43 0.630 DEBUG Adding transitive dependency for huggingface-hub==0.34.4: fsspec>=2023.5.0 2025-09-07T09:15:14.0000494Z #43 0.630 DEBUG Adding transitive dependency for huggingface-hub==0.34.4: packaging>=20.9 2025-09-07T09:15:14.0001235Z #43 0.630 DEBUG Adding transitive dependency for huggingface-hub==0.34.4: pyyaml>=5.1 2025-09-07T09:15:14.0001962Z #43 0.630 DEBUG Adding transitive dependency for huggingface-hub==0.34.4: requests* 2025-09-07T09:15:14.0002689Z #43 0.630 DEBUG Adding transitive dependency for huggingface-hub==0.34.4: tqdm>=4.42.1 2025-09-07T09:15:14.0003476Z #43 0.630 DEBUG Adding transitive dependency for huggingface-hub==0.34.4: typing-extensions>=3.7.4.3 2025-09-07T09:15:14.0004955Z #43 0.630 DEBUG Adding transitive dependency for huggingface-hub==0.34.4: hf-xet{platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'arm64' or platform_machine == 'x86_64'}>=1.1.3, <2.0.0 2025-09-07T09:15:14.0006119Z #43 0.630 DEBUG Searching for a compatible version of safetensors (>=0.4.3) 2025-09-07T09:15:14.0006970Z #43 0.630 DEBUG Found installed version of fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) that satisfies >=2023.5.0 2025-09-07T09:15:14.0008134Z #43 0.630 DEBUG Selecting: safetensors==0.6.2 [compatible] (safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0009008Z #43 0.630 DEBUG Searching for a compatible version of starlette (>=0.40.0, <0.48.0) 2025-09-07T09:15:14.0009718Z #43 0.630 DEBUG Selecting: starlette==0.47.3 [compatible] (starlette-0.47.3-py3-none-any.whl) 2025-09-07T09:15:14.0010440Z #43 0.630 DEBUG Adding transitive dependency for starlette==0.47.3: anyio>=3.6.2, <5 2025-09-07T09:15:14.0011280Z #43 0.630 DEBUG Adding transitive dependency for starlette==0.47.3: typing-extensions{python_full_version < '3.13'}>=4.10.0 2025-09-07T09:15:14.0012251Z #43 0.630 DEBUG Searching for a compatible version of typing-extensions{python_full_version < '3.13'} (>=4.10.0) 2025-09-07T09:15:14.0013609Z #43 0.630 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.10.0 2025-09-07T09:15:14.0014593Z #43 0.630 DEBUG Selecting: typing-extensions==4.14.1 [installed] (installed) 2025-09-07T09:15:14.0015414Z #43 0.630 DEBUG Adding transitive dependency for typing-extensions==4.14.1: typing-extensions==4.14.1 2025-09-07T09:15:14.0016414Z #43 0.630 DEBUG Adding transitive dependency for typing-extensions==4.14.1: typing-extensions{python_full_version < '3.13'}==4.14.1 2025-09-07T09:15:14.0017303Z #43 0.630 DEBUG Found stale response for: https://pypi.org/simple/hf-xet/ 2025-09-07T09:15:14.0017964Z #43 0.630 DEBUG Sending revalidation request for: https://pypi.org/simple/hf-xet/ 2025-09-07T09:15:14.0018800Z #43 0.630 DEBUG Searching for a compatible version of typing-extensions{python_full_version < '3.13'} (==4.14.1) 2025-09-07T09:15:14.0019941Z #43 0.630 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies ==4.14.1 2025-09-07T09:15:14.0020907Z #43 0.630 DEBUG Selecting: typing-extensions==4.14.1 [installed] (installed) 2025-09-07T09:15:14.0021584Z #43 0.630 DEBUG Searching for a compatible version of fastapi-cli[standard] (>=0.0.8) 2025-09-07T09:15:14.0022333Z #43 0.630 DEBUG Selecting: fastapi-cli==0.0.10 [compatible] (fastapi_cli-0.0.10-py3-none-any.whl) 2025-09-07T09:15:14.0023110Z #43 0.630 DEBUG Adding transitive dependency for fastapi-cli==0.0.10: fastapi-cli==0.0.10 2025-09-07T09:15:14.0023897Z #43 0.630 DEBUG Adding transitive dependency for fastapi-cli==0.0.10: fastapi-cli[standard]==0.0.10 2025-09-07T09:15:14.0024638Z #43 0.630 DEBUG Searching for a compatible version of fastapi-cli (==0.0.10) 2025-09-07T09:15:14.0025451Z #43 0.630 DEBUG Selecting: fastapi-cli==0.0.10 [compatible] (fastapi_cli-0.0.10-py3-none-any.whl) 2025-09-07T09:15:14.0026934Z #43 0.630 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/7c/62/0f00036925c0614e333a2baf739c861453a6779331ffb47ec9a6147f860b/fastapi_cli-0.0.10-py3-none-any.whl.metadata 2025-09-07T09:15:14.0028217Z #43 0.630 DEBUG Adding transitive dependency for fastapi-cli==0.0.10: typer>=0.15.1 2025-09-07T09:15:14.0028965Z #43 0.630 DEBUG Adding transitive dependency for fastapi-cli==0.0.10: uvicorn[standard]>=0.15.0 2025-09-07T09:15:14.0029720Z #43 0.630 DEBUG Adding transitive dependency for fastapi-cli==0.0.10: rich-toolkit>=0.14.8 2025-09-07T09:15:14.0030438Z #43 0.630 DEBUG Searching for a compatible version of fastapi-cli[standard] (==0.0.10) 2025-09-07T09:15:14.0031158Z #43 0.630 DEBUG Selecting: fastapi-cli==0.0.10 [compatible] (fastapi_cli-0.0.10-py3-none-any.whl) 2025-09-07T09:15:14.0031978Z #43 0.630 DEBUG Adding transitive dependency for fastapi-cli==0.0.10: uvicorn[standard]>=0.15.0 2025-09-07T09:15:14.0032759Z #43 0.630 DEBUG Adding transitive dependency for fastapi-cli==0.0.10: fastapi-cloud-cli>=0.1.1 2025-09-07T09:15:14.0033453Z #43 0.630 DEBUG Searching for a compatible version of httpx (>=0.23.0, <1) 2025-09-07T09:15:14.0034085Z #43 0.630 DEBUG Selecting: httpx==0.28.1 [compatible] (httpx-0.28.1-py3-none-any.whl) 2025-09-07T09:15:14.0034691Z #43 0.630 DEBUG Adding transitive dependency for httpx==0.28.1: anyio* 2025-09-07T09:15:14.0035306Z #43 0.630 DEBUG Adding transitive dependency for httpx==0.28.1: certifi* 2025-09-07T09:15:14.0035944Z #43 0.630 DEBUG Adding transitive dependency for httpx==0.28.1: httpcore>=1.dev0, <2.dev0 2025-09-07T09:15:14.0036584Z #43 0.630 DEBUG Adding transitive dependency for httpx==0.28.1: idna* 2025-09-07T09:15:14.0037143Z #43 0.630 DEBUG Searching for a compatible version of jinja2 (>=3.1.5) 2025-09-07T09:15:14.0037930Z #43 0.630 DEBUG Found installed version of jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) that satisfies >=3.1.5 2025-09-07T09:15:14.0038700Z #43 0.630 DEBUG Selecting: jinja2==3.1.6 [installed] (installed) 2025-09-07T09:15:14.0039289Z #43 0.630 DEBUG Adding transitive dependency for jinja2==3.1.6: markupsafe>=2.0 2025-09-07T09:15:14.0039944Z #43 0.630 DEBUG Searching for a compatible version of python-multipart (>=0.0.18) 2025-09-07T09:15:14.0040707Z #43 0.630 DEBUG Selecting: python-multipart==0.0.20 [compatible] (python_multipart-0.0.20-py3-none-any.whl) 2025-09-07T09:15:14.0041532Z #43 0.630 DEBUG Searching for a compatible version of email-validator (>=2.0.0) 2025-09-07T09:15:14.0042286Z #43 0.630 DEBUG Selecting: email-validator==2.3.0 [compatible] (email_validator-2.3.0-py3-none-any.whl) 2025-09-07T09:15:14.0043065Z #43 0.630 DEBUG Adding transitive dependency for email-validator==2.3.0: dnspython>=2.0.0 2025-09-07T09:15:14.0043784Z #43 0.630 DEBUG Adding transitive dependency for email-validator==2.3.0: idna>=2.0.0 2025-09-07T09:15:14.0044455Z #43 0.630 DEBUG Searching for a compatible version of uvicorn[standard] (>=0.15.0) 2025-09-07T09:15:14.0045143Z #43 0.631 DEBUG Selecting: uvicorn==0.35.0 [compatible] (uvicorn-0.35.0-py3-none-any.whl) 2025-09-07T09:15:14.0045859Z #43 0.631 DEBUG Found stale response for: https://pypi.org/simple/fastapi-cloud-cli/ 2025-09-07T09:15:14.0046535Z #43 0.631 DEBUG Adding transitive dependency for uvicorn==0.35.0: uvicorn==0.35.0 2025-09-07T09:15:14.0047274Z #43 0.631 DEBUG Sending revalidation request for: https://pypi.org/simple/fastapi-cloud-cli/ 2025-09-07T09:15:14.0048034Z #43 0.631 DEBUG Adding transitive dependency for uvicorn==0.35.0: uvicorn[standard]==0.35.0 2025-09-07T09:15:14.0048704Z #43 0.631 DEBUG Searching for a compatible version of uvicorn (==0.35.0) 2025-09-07T09:15:14.0049334Z #43 0.631 DEBUG Selecting: uvicorn==0.35.0 [compatible] (uvicorn-0.35.0-py3-none-any.whl) 2025-09-07T09:15:14.0049989Z #43 0.631 DEBUG Found stale response for: https://pypi.org/simple/typer/ 2025-09-07T09:15:14.0050629Z #43 0.631 DEBUG Sending revalidation request for: https://pypi.org/simple/typer/ 2025-09-07T09:15:14.0051279Z #43 0.631 DEBUG Found stale response for: https://pypi.org/simple/rich-toolkit/ 2025-09-07T09:15:14.0052017Z #43 0.631 DEBUG Sending revalidation request for: https://pypi.org/simple/rich-toolkit/ 2025-09-07T09:15:14.0052782Z #43 0.631 DEBUG Found stale response for: https://pypi.org/simple/httpcore/ 2025-09-07T09:15:14.0053634Z #43 0.631 DEBUG Sending revalidation request for: https://pypi.org/simple/httpcore/ 2025-09-07T09:15:14.0054333Z #43 0.631 DEBUG Found stale response for: https://pypi.org/simple/dnspython/ 2025-09-07T09:15:14.0055021Z #43 0.631 DEBUG Sending revalidation request for: https://pypi.org/simple/dnspython/ 2025-09-07T09:15:14.0056370Z #43 0.631 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/d2/e2/dc81b1bd1dcfe91735810265e9d26bc8ec5da45b4c0f6237e286819194c3/uvicorn-0.35.0-py3-none-any.whl.metadata 2025-09-07T09:15:14.0057690Z #43 0.632 DEBUG Found stale response for: https://pypi.org/simple/markupsafe/ 2025-09-07T09:15:14.0058337Z #43 0.632 DEBUG Adding transitive dependency for uvicorn==0.35.0: click>=7.0 2025-09-07T09:15:14.0058959Z #43 0.632 DEBUG Adding transitive dependency for uvicorn==0.35.0: h11>=0.8 2025-09-07T09:15:14.0059629Z #43 0.632 DEBUG Sending revalidation request for: https://pypi.org/simple/markupsafe/ 2025-09-07T09:15:14.0060346Z #43 0.632 DEBUG Searching for a compatible version of uvicorn[standard] (==0.35.0) 2025-09-07T09:15:14.0061071Z #43 0.632 DEBUG Selecting: uvicorn==0.35.0 [compatible] (uvicorn-0.35.0-py3-none-any.whl) 2025-09-07T09:15:14.0061782Z #43 0.632 DEBUG Adding transitive dependency for uvicorn==0.35.0: httptools>=0.6.3 2025-09-07T09:15:14.0062474Z #43 0.632 DEBUG Adding transitive dependency for uvicorn==0.35.0: python-dotenv>=0.13 2025-09-07T09:15:14.0063158Z #43 0.632 DEBUG Adding transitive dependency for uvicorn==0.35.0: pyyaml>=5.1 2025-09-07T09:15:14.0064232Z #43 0.632 DEBUG Adding transitive dependency for uvicorn==0.35.0: uvloop{platform_python_implementation != 'PyPy' and sys_platform != 'cygwin' and sys_platform != 'win32'}>=0.15.1 2025-09-07T09:15:14.0065429Z #43 0.632 DEBUG Adding transitive dependency for uvicorn==0.35.0: watchfiles>=0.13 2025-09-07T09:15:14.0066099Z #43 0.632 DEBUG Adding transitive dependency for uvicorn==0.35.0: websockets>=10.4 2025-09-07T09:15:14.0066750Z #43 0.632 DEBUG Searching for a compatible version of aiohappyeyeballs (>=2.5.0) 2025-09-07T09:15:14.0067561Z #43 0.632 DEBUG Selecting: aiohappyeyeballs==2.6.1 [compatible] (aiohappyeyeballs-2.6.1-py3-none-any.whl) 2025-09-07T09:15:14.0068308Z #43 0.632 DEBUG Searching for a compatible version of aiosignal (>=1.4.0) 2025-09-07T09:15:14.0068961Z #43 0.632 DEBUG Selecting: aiosignal==1.4.0 [compatible] (aiosignal-1.4.0-py3-none-any.whl) 2025-09-07T09:15:14.0069676Z #43 0.632 DEBUG Adding transitive dependency for aiosignal==1.4.0: frozenlist>=1.1.0 2025-09-07T09:15:14.0070501Z #43 0.632 DEBUG Adding transitive dependency for aiosignal==1.4.0: typing-extensions{python_full_version < '3.13'}>=4.2 2025-09-07T09:15:14.0071279Z #43 0.632 DEBUG Searching for a compatible version of attrs (>=17.3.0) 2025-09-07T09:15:14.0071898Z #43 0.632 DEBUG Selecting: attrs==25.3.0 [compatible] (attrs-25.3.0-py3-none-any.whl) 2025-09-07T09:15:14.0072518Z #43 0.632 DEBUG Searching for a compatible version of frozenlist (>=1.1.1) 2025-09-07T09:15:14.0073545Z #43 0.632 DEBUG Selecting: frozenlist==1.7.0 [compatible] (frozenlist-1.7.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0074558Z #43 0.632 DEBUG Searching for a compatible version of multidict (>=4.5, <7.0) 2025-09-07T09:15:14.0075167Z #43 0.632 DEBUG Found stale response for: https://pypi.org/simple/h11/ 2025-09-07T09:15:14.0076107Z #43 0.632 DEBUG Selecting: multidict==6.6.4 [compatible] (multidict-6.6.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:15:14.0077073Z #43 0.632 DEBUG Sending revalidation request for: https://pypi.org/simple/h11/ 2025-09-07T09:15:14.0077707Z #43 0.632 DEBUG Searching for a compatible version of propcache (>=0.2.0) 2025-09-07T09:15:14.0078574Z #43 0.632 DEBUG Selecting: propcache==0.3.2 [compatible] (propcache-0.3.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0079420Z #43 0.632 DEBUG Searching for a compatible version of yarl (>=1.17.0, <2.0) 2025-09-07T09:15:14.0080056Z #43 0.632 DEBUG Found not-modified response for: https://pypi.org/simple/hf-xet/ 2025-09-07T09:15:14.0080874Z #43 0.632 DEBUG Selecting: yarl==1.20.1 [compatible] (yarl-1.20.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0081654Z #43 0.632 DEBUG Adding transitive dependency for yarl==1.20.1: idna>=2.0 2025-09-07T09:15:14.0082240Z #43 0.632 DEBUG Adding transitive dependency for yarl==1.20.1: multidict>=4.0 2025-09-07T09:15:14.0082878Z #43 0.632 DEBUG Adding transitive dependency for yarl==1.20.1: propcache>=0.2.1 2025-09-07T09:15:14.0083514Z #43 0.632 DEBUG Searching for a compatible version of anyio (>=3.6.2, <5) 2025-09-07T09:15:14.0084135Z #43 0.632 DEBUG Selecting: anyio==4.10.0 [compatible] (anyio-4.10.0-py3-none-any.whl) 2025-09-07T09:15:14.0084767Z #43 0.632 DEBUG Adding transitive dependency for anyio==4.10.0: idna>=2.8 2025-09-07T09:15:14.0085354Z #43 0.632 DEBUG Adding transitive dependency for anyio==4.10.0: sniffio>=1.1 2025-09-07T09:15:14.0086132Z #43 0.632 DEBUG Adding transitive dependency for anyio==4.10.0: typing-extensions{python_full_version < '3.13'}>=4.5 2025-09-07T09:15:14.0086958Z #43 0.632 DEBUG Found stale response for: https://pypi.org/simple/python-dotenv/ 2025-09-07T09:15:14.0087670Z #43 0.632 DEBUG Sending revalidation request for: https://pypi.org/simple/python-dotenv/ 2025-09-07T09:15:14.0088343Z #43 0.632 DEBUG Searching for a compatible version of distro (>=1.7.0, <2) 2025-09-07T09:15:14.0088961Z #43 0.632 DEBUG Selecting: distro==1.9.0 [compatible] (distro-1.9.0-py3-none-any.whl) 2025-09-07T09:15:14.0089591Z #43 0.632 DEBUG Searching for a compatible version of jiter (>=0.4.0, <1) 2025-09-07T09:15:14.0090371Z #43 0.632 DEBUG Selecting: jiter==0.10.0 [compatible] (jiter-0.10.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0091180Z #43 0.632 DEBUG Found stale response for: https://pypi.org/simple/uvloop/ 2025-09-07T09:15:14.0091818Z #43 0.632 DEBUG Sending revalidation request for: https://pypi.org/simple/uvloop/ 2025-09-07T09:15:14.0092931Z #43 0.632 DEBUG Found stale response for: https://pypi.org/simple/httptools/ 2025-09-07T09:15:14.0093715Z #43 0.632 DEBUG Sending revalidation request for: https://pypi.org/simple/httptools/ 2025-09-07T09:15:14.0094369Z #43 0.632 DEBUG Searching for a compatible version of sniffio (>=1.1) 2025-09-07T09:15:14.0095021Z #43 0.632 DEBUG Selecting: sniffio==1.3.1 [compatible] (sniffio-1.3.1-py3-none-any.whl) 2025-09-07T09:15:14.0095705Z #43 0.632 DEBUG Searching for a compatible version of annotated-types (>=0.6.0) 2025-09-07T09:15:14.0096482Z #43 0.632 DEBUG Selecting: annotated-types==0.7.0 [compatible] (annotated_types-0.7.0-py3-none-any.whl) 2025-09-07T09:15:14.0097260Z #43 0.632 DEBUG Searching for a compatible version of typing-inspection (>=0.4.0) 2025-09-07T09:15:14.0098049Z #43 0.632 DEBUG Selecting: typing-inspection==0.4.1 [compatible] (typing_inspection-0.4.1-py3-none-any.whl) 2025-09-07T09:15:14.0098947Z #43 0.632 DEBUG Adding transitive dependency for typing-inspection==0.4.1: typing-extensions>=4.12.0 2025-09-07T09:15:14.0099689Z #43 0.632 DEBUG Searching for a compatible version of jsonschema (>=4.21.1) 2025-09-07T09:15:14.0100398Z #43 0.632 DEBUG Selecting: jsonschema==4.25.1 [compatible] (jsonschema-4.25.1-py3-none-any.whl) 2025-09-07T09:15:14.0101143Z #43 0.632 DEBUG Adding transitive dependency for jsonschema==4.25.1: attrs>=22.2.0 2025-09-07T09:15:14.0101929Z #43 0.632 DEBUG Adding transitive dependency for jsonschema==4.25.1: jsonschema-specifications>=2023.3.6 2025-09-07T09:15:14.0102758Z #43 0.632 DEBUG Adding transitive dependency for jsonschema==4.25.1: referencing>=0.28.4 2025-09-07T09:15:14.0103472Z #43 0.632 DEBUG Adding transitive dependency for jsonschema==4.25.1: rpds-py>=0.7.1 2025-09-07T09:15:14.0104287Z #43 0.632 DEBUG Searching for a compatible version of pydantic-extra-types[pycountry] (>=2.10.5) 2025-09-07T09:15:14.0105275Z #43 0.632 DEBUG Selecting: pydantic-extra-types==2.10.5 [compatible] (pydantic_extra_types-2.10.5-py3-none-any.whl) 2025-09-07T09:15:14.0106219Z #43 0.632 DEBUG Adding transitive dependency for pydantic-extra-types==2.10.5: pydantic-extra-types==2.10.5 2025-09-07T09:15:14.0107193Z #43 0.632 DEBUG Adding transitive dependency for pydantic-extra-types==2.10.5: pydantic-extra-types[pycountry]==2.10.5 2025-09-07T09:15:14.0108040Z #43 0.632 DEBUG Searching for a compatible version of pydantic-extra-types (==2.10.5) 2025-09-07T09:15:14.0108865Z #43 0.632 DEBUG Selecting: pydantic-extra-types==2.10.5 [compatible] (pydantic_extra_types-2.10.5-py3-none-any.whl) 2025-09-07T09:15:14.0109766Z #43 0.632 DEBUG Found not-modified response for: https://pypi.org/simple/fastapi-cloud-cli/ 2025-09-07T09:15:14.0110547Z #43 0.633 DEBUG Found stale response for: https://pypi.org/simple/jsonschema-specifications/ 2025-09-07T09:15:14.0111367Z #43 0.633 DEBUG Sending revalidation request for: https://pypi.org/simple/jsonschema-specifications/ 2025-09-07T09:15:14.0112106Z #43 0.633 DEBUG Found stale response for: https://pypi.org/simple/referencing/ 2025-09-07T09:15:14.0112800Z #43 0.633 DEBUG Sending revalidation request for: https://pypi.org/simple/referencing/ 2025-09-07T09:15:14.0113556Z #43 0.633 DEBUG Found not-modified response for: https://pypi.org/simple/dnspython/ 2025-09-07T09:15:14.0114273Z #43 0.633 DEBUG Found not-modified response for: https://pypi.org/simple/rich-toolkit/ 2025-09-07T09:15:14.0114965Z #43 0.633 DEBUG Found not-modified response for: https://pypi.org/simple/typer/ 2025-09-07T09:15:14.0115631Z #43 0.633 DEBUG Found not-modified response for: https://pypi.org/simple/httpcore/ 2025-09-07T09:15:14.0116995Z #43 0.633 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/70/1a/5f4fd9e7285f10c44095a4f9fe17d0f358d1702a7c74a9278c794e8a7537/pydantic_extra_types-2.10.5-py3-none-any.whl.metadata 2025-09-07T09:15:14.0118308Z #43 0.633 DEBUG Found stale response for: https://pypi.org/simple/websockets/ 2025-09-07T09:15:14.0119000Z #43 0.633 DEBUG Sending revalidation request for: https://pypi.org/simple/websockets/ 2025-09-07T09:15:14.0119769Z #43 0.633 DEBUG Adding transitive dependency for pydantic-extra-types==2.10.5: pydantic>=2.5.2 2025-09-07T09:15:14.0120603Z #43 0.633 DEBUG Adding transitive dependency for pydantic-extra-types==2.10.5: typing-extensions* 2025-09-07T09:15:14.0121420Z #43 0.633 DEBUG Searching for a compatible version of pydantic-extra-types[pycountry] (==2.10.5) 2025-09-07T09:15:14.0122281Z #43 0.633 DEBUG Selecting: pydantic-extra-types==2.10.5 [compatible] (pydantic_extra_types-2.10.5-py3-none-any.whl) 2025-09-07T09:15:14.0123154Z #43 0.633 DEBUG Adding transitive dependency for pydantic-extra-types==2.10.5: pycountry>=23 2025-09-07T09:15:14.0123847Z #43 0.633 DEBUG Searching for a compatible version of soundfile (>=0.12.1) 2025-09-07T09:15:14.0124608Z #43 0.633 DEBUG Selecting: soundfile==0.13.1 [compatible] (soundfile-0.13.1-py2.py3-none-manylinux_2_28_x86_64.whl) 2025-09-07T09:15:14.0125397Z #43 0.633 DEBUG Adding transitive dependency for soundfile==0.13.1: cffi>=1.0 2025-09-07T09:15:14.0125998Z #43 0.633 DEBUG Adding transitive dependency for soundfile==0.13.1: numpy* 2025-09-07T09:15:14.0126583Z #43 0.633 DEBUG Searching for a compatible version of soxr (>=0.5.0) 2025-09-07T09:15:14.0139436Z #43 0.633 DEBUG Selecting: soxr==0.5.0.post1 [compatible] (soxr-0.5.0.post1-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0140441Z #43 0.633 DEBUG Adding transitive dependency for soxr==0.5.0.post1: numpy* 2025-09-07T09:15:14.0141037Z #43 0.633 DEBUG Searching for a compatible version of click (>=7.0) 2025-09-07T09:15:14.0141674Z #43 0.633 DEBUG Searching for a compatible version of click (>=7.0, <8.2.2 | >8.2.2) 2025-09-07T09:15:14.0142364Z #43 0.633 DEBUG Selecting: click==8.2.1 [compatible] (click-8.2.1-py3-none-any.whl) 2025-09-07T09:15:14.0143845Z #43 0.633 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/e5/a6/5aa862489a2918a096166fd98d9fe86b7fd53c607678b3fa9d8c432d88d5/fastapi_cloud_cli-0.1.5-py3-none-any.whl.metadata 2025-09-07T09:15:14.0145291Z #43 0.634 DEBUG Found stale response for: https://pypi.org/simple/pycountry/ 2025-09-07T09:15:14.0145969Z #43 0.634 DEBUG Sending revalidation request for: https://pypi.org/simple/pycountry/ 2025-09-07T09:15:14.0147272Z #43 0.634 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/68/1b/e0a87d256e40e8c888847551b20a017a6b98139178505dc7ffb96f04e954/dnspython-2.7.0-py3-none-any.whl.metadata 2025-09-07T09:15:14.0149155Z #43 0.634 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/c8/49/42821d55ead7b5a87c8d121edf323cb393d8579f63e933002ade900b784f/rich_toolkit-0.15.1-py3-none-any.whl.metadata 2025-09-07T09:15:14.0150500Z #43 0.634 DEBUG Found not-modified response for: https://pypi.org/simple/markupsafe/ 2025-09-07T09:15:14.0151186Z #43 0.634 DEBUG Found not-modified response for: https://pypi.org/simple/h11/ 2025-09-07T09:15:14.0151843Z #43 0.634 DEBUG Found not-modified response for: https://pypi.org/simple/uvloop/ 2025-09-07T09:15:14.0152520Z #43 0.634 DEBUG Found not-modified response for: https://pypi.org/simple/httptools/ 2025-09-07T09:15:14.0153858Z #43 0.634 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/93/72/6b3e70d32e89a5cbb6a4513726c1ae8762165b027af569289e19ec08edd8/typer-0.17.4-py3-none-any.whl.metadata 2025-09-07T09:15:14.0155678Z #43 0.634 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/85/32/10bb5764d90a8eee674e9dc6f4db6a0ab47c8c4d0d83c27f7c39ac415a4d/click-8.2.1-py3-none-any.whl.metadata 2025-09-07T09:15:14.0157523Z #43 0.634 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/7e/f5/f66802a942d491edb555dd61e3a9961140fd64c90bce1eafd741609d334d/httpcore-1.0.9-py3-none-any.whl.metadata 2025-09-07T09:15:14.0158786Z #43 0.634 DEBUG Searching for a compatible version of msgpack (>=1.0.0, <2.0.0) 2025-09-07T09:15:14.0159701Z #43 0.634 DEBUG Selecting: msgpack==1.1.1 [compatible] (msgpack-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0160656Z #43 0.635 DEBUG Searching for a compatible version of cupy-cuda12x{sys_platform != 'darwin'} (*) 2025-09-07T09:15:14.0161564Z #43 0.635 DEBUG Selecting: cupy-cuda12x==13.6.0 [compatible] (cupy_cuda12x-13.6.0-cp312-cp312-manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0162429Z #43 0.635 DEBUG Adding transitive dependency for cupy-cuda12x==13.6.0: cupy-cuda12x==13.6.0 2025-09-07T09:15:14.0163286Z #43 0.635 DEBUG Adding transitive dependency for cupy-cuda12x==13.6.0: cupy-cuda12x{sys_platform != 'darwin'}==13.6.0 2025-09-07T09:15:14.0164071Z #43 0.635 DEBUG Searching for a compatible version of cupy-cuda12x (==13.6.0) 2025-09-07T09:15:14.0164865Z #43 0.635 DEBUG Selecting: cupy-cuda12x==13.6.0 [compatible] (cupy_cuda12x-13.6.0-cp312-cp312-manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0165710Z #43 0.635 DEBUG Found not-modified response for: https://pypi.org/simple/python-dotenv/ 2025-09-07T09:15:14.0166516Z #43 0.635 DEBUG Found not-modified response for: https://pypi.org/simple/jsonschema-specifications/ 2025-09-07T09:15:14.0167304Z #43 0.635 DEBUG Found not-modified response for: https://pypi.org/simple/referencing/ 2025-09-07T09:15:14.0167952Z #43 0.635 DEBUG Found stale response for: https://pypi.org/simple/cffi/ 2025-09-07T09:15:14.0168578Z #43 0.635 DEBUG Sending revalidation request for: https://pypi.org/simple/cffi/ 2025-09-07T09:15:14.0169646Z #43 0.636 DEBUG Found installed version of markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) that satisfies >=2.0 2025-09-07T09:15:14.0171388Z #43 0.636 DEBUG No cache entry for: https://files.pythonhosted.org/packages/e0/95/d7e1295141e7d530674a3cc567e13ed0eb6b81524cb122d797ed996b5bea/cupy_cuda12x-13.6.0-cp312-cp312-manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:14.0173588Z #43 0.636 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/04/4b/29cac41a4d98d144bf5f6d33995617b185d14b22401f75ca86f384e87ff1/h11-0.16.0-py3-none-any.whl.metadata 2025-09-07T09:15:14.0175848Z #43 0.636 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/f7/d8/b644c44acc1368938317d76ac991c9bba1166311880bcc0ac297cb9d6bd7/httptools-0.6.4-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:14.0178180Z #43 0.636 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/5f/ed/539768cf28c661b5b068d66d96a2f155c4971a5d55684a514c1a0e0dec2f/python_dotenv-1.1.1-py3-none-any.whl.metadata 2025-09-07T09:15:14.0180256Z #43 0.636 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/01/0e/b27cdbaccf30b890c40ed1da9fd4a3593a5cf94dae54fb34f8a4b74fcd3f/jsonschema_specifications-2025.4.1-py3-none-any.whl.metadata 2025-09-07T09:15:14.0182369Z #43 0.636 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/c1/b1/3baf80dc6d2b7bc27a95a67752d0208e410351e3feb4eb78de5f77454d8d/referencing-0.36.2-py3-none-any.whl.metadata 2025-09-07T09:15:14.0183734Z #43 0.636 DEBUG Found not-modified response for: https://pypi.org/simple/websockets/ 2025-09-07T09:15:14.0184471Z #43 0.637 DEBUG Found not-modified response for: https://pypi.org/simple/pycountry/ 2025-09-07T09:15:14.0185923Z #43 0.637 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/b1/ec/1fb891d8a2660716aadb2143235481d15ed1cbfe3ad669194690b0604492/pycountry-24.6.1-py3-none-any.whl.metadata 2025-09-07T09:15:14.0188166Z #43 0.637 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/14/8f/aa61f528fba38578ec553c145857a181384c72b98156f858ca5c8e82d9d3/websockets-15.0.1-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:14.0189807Z #43 0.638 DEBUG Found not-modified response for: https://pypi.org/simple/cffi/ 2025-09-07T09:15:14.0190436Z #43 0.638 DEBUG Found stale response for: https://pypi.org/simple/rpds-py/ 2025-09-07T09:15:14.0191101Z #43 0.638 DEBUG Sending revalidation request for: https://pypi.org/simple/rpds-py/ 2025-09-07T09:15:14.0191801Z #43 0.638 DEBUG Adding transitive dependency for cupy-cuda12x==13.6.0: numpy>=1.22, <2.6 2025-09-07T09:15:14.0192946Z #43 0.638 DEBUG Adding transitive dependency for cupy-cuda12x==13.6.0: fastrlock>=0.5 2025-09-07T09:15:14.0193752Z #43 0.638 DEBUG Searching for a compatible version of cupy-cuda12x{sys_platform != 'darwin'} (==13.6.0) 2025-09-07T09:15:14.0194668Z #43 0.638 DEBUG Selecting: cupy-cuda12x==13.6.0 [compatible] (cupy_cuda12x-13.6.0-cp312-cp312-manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0195534Z #43 0.638 DEBUG Adding transitive dependency for cupy-cuda12x==13.6.0: numpy>=1.22, <2.6 2025-09-07T09:15:14.0196259Z #43 0.638 DEBUG Adding transitive dependency for cupy-cuda12x==13.6.0: fastrlock>=0.5 2025-09-07T09:15:14.0196927Z #43 0.638 DEBUG Searching for a compatible version of sympy (>=1.13.3) 2025-09-07T09:15:14.0197756Z #43 0.638 DEBUG Found installed version of sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) that satisfies >=1.13.3 2025-09-07T09:15:14.0198541Z #43 0.638 DEBUG Selecting: sympy==1.14.0 [installed] (installed) 2025-09-07T09:15:14.0199154Z #43 0.638 DEBUG Adding transitive dependency for sympy==1.14.0: mpmath>=1.1.0, <1.4 2025-09-07T09:15:14.0199798Z #43 0.638 DEBUG Searching for a compatible version of networkx (>=2.5.1) 2025-09-07T09:15:14.0200641Z #43 0.638 DEBUG Found installed version of networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) that satisfies >=2.5.1 2025-09-07T09:15:14.0201433Z #43 0.638 DEBUG Selecting: networkx==3.5 [installed] (installed) 2025-09-07T09:15:14.0202000Z #43 0.638 DEBUG Searching for a compatible version of fsspec (>=2023.5.0) 2025-09-07T09:15:14.0202878Z #43 0.638 DEBUG Found installed version of fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) that satisfies >=2023.5.0 2025-09-07T09:15:14.0203768Z #43 0.638 DEBUG Selecting: fsspec==2025.7.0 [installed] (installed) 2025-09-07T09:15:14.0204972Z #43 0.638 DEBUG Searching for a compatible version of hf-xet{platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'arm64' or platform_machine == 'x86_64'} (>=1.1.3, <2.0.0) 2025-09-07T09:15:14.0206236Z #43 0.638 DEBUG Selecting: hf-xet==1.1.9 [compatible] (hf_xet-1.1.9-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0207032Z #43 0.639 DEBUG Adding transitive dependency for hf-xet==1.1.9: hf-xet==1.1.9 2025-09-07T09:15:14.0208134Z #43 0.639 DEBUG Adding transitive dependency for hf-xet==1.1.9: hf-xet{platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'arm64' or platform_machine == 'x86_64'}==1.1.9 2025-09-07T09:15:14.0209237Z #43 0.639 DEBUG Searching for a compatible version of hf-xet (==1.1.9) 2025-09-07T09:15:14.0209981Z #43 0.639 DEBUG Selecting: hf-xet==1.1.9 [compatible] (hf_xet-1.1.9-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0210754Z #43 0.639 DEBUG No cache entry for: https://pypi.org/simple/fastrlock/ 2025-09-07T09:15:14.0212148Z #43 0.639 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/b2/d5/da47df7004cb17e4955df6a43d14b3b4ae77737dff8bf7f8f333196717bf/cffi-1.17.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:14.0214682Z #43 0.639 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/81/42/7e6955cf0621e87491a1fb8cad755d5c2517803cea174229b0ec00ff0166/hf_xet-1.1.9-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:14.0216278Z #43 0.639 DEBUG Found stale response for: https://pypi.org/simple/mpmath/ 2025-09-07T09:15:14.0216958Z #43 0.639 DEBUG Sending revalidation request for: https://pypi.org/simple/mpmath/ 2025-09-07T09:15:14.0218112Z #43 0.639 DEBUG Searching for a compatible version of hf-xet{platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'arm64' or platform_machine == 'x86_64'} (==1.1.9) 2025-09-07T09:15:14.0219385Z #43 0.639 DEBUG Selecting: hf-xet==1.1.9 [compatible] (hf_xet-1.1.9-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0220181Z #43 0.639 DEBUG Searching for a compatible version of typer (>=0.15.1) 2025-09-07T09:15:14.0220851Z #43 0.639 DEBUG Selecting: typer==0.17.4 [compatible] (typer-0.17.4-py3-none-any.whl) 2025-09-07T09:15:14.0221526Z #43 0.639 DEBUG Adding transitive dependency for typer==0.17.4: click>=8.0.0 2025-09-07T09:15:14.0222231Z #43 0.639 DEBUG Adding transitive dependency for typer==0.17.4: typing-extensions>=3.7.4.3 2025-09-07T09:15:14.0222951Z #43 0.639 DEBUG Adding transitive dependency for typer==0.17.4: shellingham>=1.3.0 2025-09-07T09:15:14.0223617Z #43 0.639 DEBUG Adding transitive dependency for typer==0.17.4: rich>=10.11.0 2025-09-07T09:15:14.0224242Z #43 0.639 DEBUG Searching for a compatible version of rich-toolkit (>=0.14.8) 2025-09-07T09:15:14.0225082Z #43 0.639 DEBUG Selecting: rich-toolkit==0.15.1 [compatible] (rich_toolkit-0.15.1-py3-none-any.whl) 2025-09-07T09:15:14.0225822Z #43 0.639 DEBUG Adding transitive dependency for rich-toolkit==0.15.1: click>=8.1.7 2025-09-07T09:15:14.0226490Z #43 0.639 DEBUG Adding transitive dependency for rich-toolkit==0.15.1: rich>=13.7.1 2025-09-07T09:15:14.0227235Z #43 0.639 DEBUG Adding transitive dependency for rich-toolkit==0.15.1: typing-extensions>=4.12.2 2025-09-07T09:15:14.0227951Z #43 0.639 DEBUG Searching for a compatible version of fastapi-cloud-cli (>=0.1.1) 2025-09-07T09:15:14.0228719Z #43 0.639 DEBUG Selecting: fastapi-cloud-cli==0.1.5 [compatible] (fastapi_cloud_cli-0.1.5-py3-none-any.whl) 2025-09-07T09:15:14.0229519Z #43 0.639 DEBUG Adding transitive dependency for fastapi-cloud-cli==0.1.5: typer>=0.12.3 2025-09-07T09:15:14.0230292Z #43 0.639 DEBUG Adding transitive dependency for fastapi-cloud-cli==0.1.5: uvicorn[standard]>=0.15.0 2025-09-07T09:15:14.0231115Z #43 0.639 DEBUG Adding transitive dependency for fastapi-cloud-cli==0.1.5: rignore>=0.5.1 2025-09-07T09:15:14.0231834Z #43 0.639 DEBUG Adding transitive dependency for fastapi-cloud-cli==0.1.5: httpx>=0.27.0 2025-09-07T09:15:14.0232597Z #43 0.639 DEBUG Adding transitive dependency for fastapi-cloud-cli==0.1.5: rich-toolkit>=0.14.5 2025-09-07T09:15:14.0233393Z #43 0.639 DEBUG Adding transitive dependency for fastapi-cloud-cli==0.1.5: pydantic[email]>=1.6.1 2025-09-07T09:15:14.0234190Z #43 0.639 DEBUG Adding transitive dependency for fastapi-cloud-cli==0.1.5: sentry-sdk>=2.20.0 2025-09-07T09:15:14.0234905Z #43 0.639 DEBUG Searching for a compatible version of pydantic[email] (>=1.6.1) 2025-09-07T09:15:14.0235583Z #43 0.639 DEBUG Selecting: pydantic==2.11.7 [compatible] (pydantic-2.11.7-py3-none-any.whl) 2025-09-07T09:15:14.0236336Z #43 0.639 DEBUG Adding transitive dependency for pydantic==2.11.7: pydantic==2.11.7 2025-09-07T09:15:14.0237036Z #43 0.639 DEBUG Adding transitive dependency for pydantic==2.11.7: pydantic[email]==2.11.7 2025-09-07T09:15:14.0237735Z #43 0.639 DEBUG Searching for a compatible version of pydantic[email] (==2.11.7) 2025-09-07T09:15:14.0238429Z #43 0.639 DEBUG Selecting: pydantic==2.11.7 [compatible] (pydantic-2.11.7-py3-none-any.whl) 2025-09-07T09:15:14.0239151Z #43 0.639 DEBUG Adding transitive dependency for pydantic==2.11.7: email-validator>=2.0.0 2025-09-07T09:15:14.0239884Z #43 0.639 DEBUG Found stale response for: https://pypi.org/simple/shellingham/ 2025-09-07T09:15:14.0240570Z #43 0.639 DEBUG Sending revalidation request for: https://pypi.org/simple/shellingham/ 2025-09-07T09:15:14.0241275Z #43 0.639 DEBUG Searching for a compatible version of httpcore (>=1.dev0, <2.dev0) 2025-09-07T09:15:14.0241967Z #43 0.639 DEBUG Selecting: httpcore==1.0.9 [compatible] (httpcore-1.0.9-py3-none-any.whl) 2025-09-07T09:15:14.0242625Z #43 0.639 DEBUG Adding transitive dependency for httpcore==1.0.9: certifi* 2025-09-07T09:15:14.0243232Z #43 0.639 DEBUG Adding transitive dependency for httpcore==1.0.9: h11>=0.16 2025-09-07T09:15:14.0243821Z #43 0.639 DEBUG Searching for a compatible version of markupsafe (>=2.0) 2025-09-07T09:15:14.0244860Z #43 0.639 DEBUG Found installed version of markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) that satisfies >=2.0 2025-09-07T09:15:14.0245853Z #43 0.639 DEBUG Selecting: markupsafe==3.0.2 [installed] (installed) 2025-09-07T09:15:14.0246463Z #43 0.639 DEBUG Searching for a compatible version of dnspython (>=2.0.0) 2025-09-07T09:15:14.0247140Z #43 0.639 DEBUG Selecting: dnspython==2.7.0 [compatible] (dnspython-2.7.0-py3-none-any.whl) 2025-09-07T09:15:14.0247771Z #43 0.639 DEBUG Searching for a compatible version of h11 (>=0.16) 2025-09-07T09:15:14.0248346Z #43 0.639 DEBUG Selecting: h11==0.16.0 [compatible] (h11-0.16.0-py3-none-any.whl) 2025-09-07T09:15:14.0248944Z #43 0.639 DEBUG Searching for a compatible version of httptools (>=0.6.3) 2025-09-07T09:15:14.0249961Z #43 0.639 DEBUG Selecting: httptools==0.6.4 [compatible] (httptools-0.6.4-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0250974Z #43 0.639 DEBUG Searching for a compatible version of python-dotenv (>=0.13) 2025-09-07T09:15:14.0251679Z #43 0.639 DEBUG Selecting: python-dotenv==1.1.1 [compatible] (python_dotenv-1.1.1-py3-none-any.whl) 2025-09-07T09:15:14.0252879Z #43 0.639 DEBUG Searching for a compatible version of uvloop{platform_python_implementation != 'PyPy' and sys_platform != 'cygwin' and sys_platform != 'win32'} (>=0.15.1) 2025-09-07T09:15:14.0254114Z #43 0.639 DEBUG Found not-modified response for: https://pypi.org/simple/rpds-py/ 2025-09-07T09:15:14.0255008Z #43 0.639 DEBUG Selecting: uvloop==0.21.0 [compatible] (uvloop-0.21.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0255878Z #43 0.639 DEBUG Adding transitive dependency for uvloop==0.21.0: uvloop==0.21.0 2025-09-07T09:15:14.0256947Z #43 0.639 DEBUG Adding transitive dependency for uvloop==0.21.0: uvloop{platform_python_implementation != 'PyPy' and sys_platform != 'cygwin' and sys_platform != 'win32'}==0.21.0 2025-09-07T09:15:14.0258033Z #43 0.639 DEBUG Searching for a compatible version of uvloop (==0.21.0) 2025-09-07T09:15:14.0258847Z #43 0.639 DEBUG Selecting: uvloop==0.21.0 [compatible] (uvloop-0.21.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0259685Z #43 0.640 DEBUG Found stale response for: https://pypi.org/simple/rich/ 2025-09-07T09:15:14.0260406Z #43 0.640 DEBUG Sending revalidation request for: https://pypi.org/simple/rich/ 2025-09-07T09:15:14.0261095Z #43 0.640 DEBUG Found stale response for: https://pypi.org/simple/sentry-sdk/ 2025-09-07T09:15:14.0261806Z #43 0.640 DEBUG Sending revalidation request for: https://pypi.org/simple/sentry-sdk/ 2025-09-07T09:15:14.0262544Z #43 0.640 DEBUG Found stale response for: https://pypi.org/simple/rignore/ 2025-09-07T09:15:14.0263217Z #43 0.640 DEBUG Sending revalidation request for: https://pypi.org/simple/rignore/ 2025-09-07T09:15:14.0263920Z #43 0.641 DEBUG Found not-modified response for: https://pypi.org/simple/mpmath/ 2025-09-07T09:15:14.0265520Z #43 0.641 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/06/a7/b4e6a19925c900be9f98bec0a75e6e8f79bb53bdeb891916609ab3958967/uvloop-0.21.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:14.0267384Z #43 0.641 DEBUG Searching for a compatible version of uvloop{platform_python_implementation != 'PyPy' and sys_platform != 'cygwin' and sys_platform != 'win32'} (==0.21.0) 2025-09-07T09:15:14.0268580Z #43 0.641 DEBUG Selecting: uvloop==0.21.0 [compatible] (uvloop-0.21.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0269392Z #43 0.641 DEBUG Searching for a compatible version of websockets (>=10.4) 2025-09-07T09:15:14.0270422Z #43 0.641 DEBUG Selecting: websockets==15.0.1 [compatible] (websockets-15.0.1-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0271512Z #43 0.641 DEBUG Searching for a compatible version of jsonschema-specifications (>=2023.3.6) 2025-09-07T09:15:14.0272468Z #43 0.641 DEBUG Selecting: jsonschema-specifications==2025.4.1 [compatible] (jsonschema_specifications-2025.4.1-py3-none-any.whl) 2025-09-07T09:15:14.0273507Z #43 0.641 DEBUG Adding transitive dependency for jsonschema-specifications==2025.4.1: referencing>=0.31.0 2025-09-07T09:15:14.0274270Z #43 0.641 DEBUG Searching for a compatible version of referencing (>=0.31.0) 2025-09-07T09:15:14.0274973Z #43 0.641 DEBUG Selecting: referencing==0.36.2 [compatible] (referencing-0.36.2-py3-none-any.whl) 2025-09-07T09:15:14.0275698Z #43 0.641 DEBUG Adding transitive dependency for referencing==0.36.2: attrs>=22.2.0 2025-09-07T09:15:14.0276390Z #43 0.641 DEBUG Adding transitive dependency for referencing==0.36.2: rpds-py>=0.7.0 2025-09-07T09:15:14.0277241Z #43 0.641 DEBUG Adding transitive dependency for referencing==0.36.2: typing-extensions{python_full_version < '3.13'}>=4.4.0 2025-09-07T09:15:14.0278125Z #43 0.642 DEBUG Found not-modified response for: https://pypi.org/simple/shellingham/ 2025-09-07T09:15:14.0279036Z #43 0.642 DEBUG Found installed version of mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) that satisfies >=1.1.0, <1.4 2025-09-07T09:15:14.0279853Z #43 0.643 DEBUG Searching for a compatible version of rpds-py (>=0.7.1) 2025-09-07T09:15:14.0280660Z #43 0.643 DEBUG Selecting: rpds-py==0.27.1 [compatible] (rpds_py-0.27.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0282287Z #43 0.644 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/ed/7b/8f4fee9ba1fb5ec856eb22d725a4efa3deb47f769597c809e03578b0f9d9/rpds_py-0.27.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:14.0283704Z #43 0.644 DEBUG Searching for a compatible version of pycountry (>=23) 2025-09-07T09:15:14.0284371Z #43 0.644 DEBUG Selecting: pycountry==24.6.1 [compatible] (pycountry-24.6.1-py3-none-any.whl) 2025-09-07T09:15:14.0285957Z #43 0.644 DEBUG No cache entry for: https://files.pythonhosted.org/packages/80/07/cdecb7aa976f34328372f1c4efd6c9dc1b039b3cc8d3f38787d640009a25/fastrlock-0.8.3-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T09:15:14.0287407Z #43 0.644 DEBUG Searching for a compatible version of cffi (>=1.0) 2025-09-07T09:15:14.0288169Z #43 0.644 DEBUG Selecting: cffi==1.17.1 [compatible] (cffi-1.17.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0289615Z #43 0.644 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/e0/f9/0595336914c5619e5f28a1fb793285925a8cd4b432c9da0a987836c7f822/shellingham-1.5.4-py2.py3-none-any.whl.metadata 2025-09-07T09:15:14.0290920Z #43 0.644 DEBUG Adding transitive dependency for cffi==1.17.1: pycparser* 2025-09-07T09:15:14.0291496Z #43 0.644 DEBUG Searching for a compatible version of fastrlock (>=0.5) 2025-09-07T09:15:14.0292842Z #43 0.644 DEBUG Selecting: fastrlock==0.8.3 [compatible] (fastrlock-0.8.3-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:15:14.0293832Z #43 0.644 DEBUG Found not-modified response for: https://pypi.org/simple/rich/ 2025-09-07T09:15:14.0294523Z #43 0.644 DEBUG Found not-modified response for: https://pypi.org/simple/sentry-sdk/ 2025-09-07T09:15:14.0295335Z #43 0.644 DEBUG Found not-modified response for: https://pypi.org/simple/rignore/ 2025-09-07T09:15:14.0296009Z #43 0.645 DEBUG Found stale response for: https://pypi.org/simple/pycparser/ 2025-09-07T09:15:14.0296716Z #43 0.645 DEBUG Sending revalidation request for: https://pypi.org/simple/pycparser/ 2025-09-07T09:15:14.0298021Z #43 0.645 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/e3/30/3c4d035596d3cf444529e0b2953ad0466f6049528a879d27534700580395/rich-14.1.0-py3-none-any.whl.metadata 2025-09-07T09:15:14.0299943Z #43 0.645 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/07/d5/f9f4a2bf5db2ca8f692c46f3821fee1f302f1b76a0e2914aee5390fca565/sentry_sdk-2.37.0-py2.py3-none-any.whl.metadata 2025-09-07T09:15:14.0302107Z #43 0.645 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/80/c8/b91afda10bd5ca1e3a80463340b899c0dc26a7750a9f3c94f668585c7f40/rignore-0.6.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata 2025-09-07T09:15:14.0303686Z #43 0.646 DEBUG Found not-modified response for: https://pypi.org/simple/pycparser/ 2025-09-07T09:15:14.0304362Z #43 0.647 DEBUG Searching for a compatible version of mpmath (>=1.1.0, <1.4) 2025-09-07T09:15:14.0305338Z #43 0.647 DEBUG Found installed version of mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) that satisfies >=1.1.0, <1.4 2025-09-07T09:15:14.0306125Z #43 0.647 DEBUG Selecting: mpmath==1.3.0 [installed] (installed) 2025-09-07T09:15:14.0306686Z #43 0.647 DEBUG Searching for a compatible version of shellingham (>=1.3.0) 2025-09-07T09:15:14.0307397Z #43 0.647 DEBUG Selecting: shellingham==1.5.4 [compatible] (shellingham-1.5.4-py2.py3-none-any.whl) 2025-09-07T09:15:14.0308102Z #43 0.647 DEBUG Searching for a compatible version of rich (>=13.7.1) 2025-09-07T09:15:14.0308689Z #43 0.647 DEBUG Selecting: rich==14.1.0 [compatible] (rich-14.1.0-py3-none-any.whl) 2025-09-07T09:15:14.0309363Z #43 0.647 DEBUG Adding transitive dependency for rich==14.1.0: markdown-it-py>=2.2.0 2025-09-07T09:15:14.0310070Z #43 0.647 DEBUG Adding transitive dependency for rich==14.1.0: pygments>=2.13.0, <3.0.0 2025-09-07T09:15:14.0311353Z #43 0.647 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/13/a3/a812df4e2dd5696d1f351d58b8fe16a405b234ad2886a0dab9183fb78109/pycparser-2.22-py3-none-any.whl.metadata 2025-09-07T09:15:14.0312583Z #43 0.647 DEBUG Searching for a compatible version of rignore (>=0.5.1) 2025-09-07T09:15:14.0313379Z #43 0.647 DEBUG Selecting: rignore==0.6.4 [compatible] (rignore-0.6.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0314248Z #43 0.647 DEBUG Searching for a compatible version of sentry-sdk (>=2.20.0) 2025-09-07T09:15:14.0314947Z #43 0.647 DEBUG Selecting: sentry-sdk==2.37.0 [compatible] (sentry_sdk-2.37.0-py2.py3-none-any.whl) 2025-09-07T09:15:14.0315677Z #43 0.647 DEBUG Adding transitive dependency for sentry-sdk==2.37.0: urllib3>=1.26.11 2025-09-07T09:15:14.0316341Z #43 0.647 DEBUG Adding transitive dependency for sentry-sdk==2.37.0: certifi* 2025-09-07T09:15:14.0316991Z #43 0.647 DEBUG Found stale response for: https://pypi.org/simple/markdown-it-py/ 2025-09-07T09:15:14.0317723Z #43 0.647 DEBUG Sending revalidation request for: https://pypi.org/simple/markdown-it-py/ 2025-09-07T09:15:14.0318391Z #43 0.647 DEBUG Searching for a compatible version of pycparser (*) 2025-09-07T09:15:14.0319023Z #43 0.647 DEBUG Selecting: pycparser==2.22 [compatible] (pycparser-2.22-py3-none-any.whl) 2025-09-07T09:15:14.0319750Z #43 0.647 DEBUG Found stale response for: https://pypi.org/simple/pygments/ 2025-09-07T09:15:14.0320426Z #43 0.647 DEBUG Sending revalidation request for: https://pypi.org/simple/pygments/ 2025-09-07T09:15:14.0321165Z #43 0.648 DEBUG Found not-modified response for: https://pypi.org/simple/markdown-it-py/ 2025-09-07T09:15:14.0321881Z #43 0.648 DEBUG Found not-modified response for: https://pypi.org/simple/pygments/ 2025-09-07T09:15:14.0322546Z #43 0.649 DEBUG Searching for a compatible version of markdown-it-py (>=2.2.0) 2025-09-07T09:15:14.0323301Z #43 0.649 DEBUG Selecting: markdown-it-py==4.0.0 [compatible] (markdown_it_py-4.0.0-py3-none-any.whl) 2025-09-07T09:15:14.0324658Z #43 0.649 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/94/54/e7d793b573f298e1c9013b8c4dade17d481164aa517d1d7148619c2cedbf/markdown_it_py-4.0.0-py3-none-any.whl.metadata 2025-09-07T09:15:14.0325990Z #43 0.649 DEBUG Adding transitive dependency for markdown-it-py==4.0.0: mdurl>=0.1, <1.dev0 2025-09-07T09:15:14.0326680Z #43 0.649 DEBUG Searching for a compatible version of pygments (>=2.13.0, <3.0.0) 2025-09-07T09:15:14.0327371Z #43 0.649 DEBUG Selecting: pygments==2.19.2 [compatible] (pygments-2.19.2-py3-none-any.whl) 2025-09-07T09:15:14.0328672Z #43 0.649 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/c7/21/705964c7812476f378728bdf590ca4b771ec72385c533964653c68e86bdc/pygments-2.19.2-py3-none-any.whl.metadata 2025-09-07T09:15:14.0329887Z #43 0.649 DEBUG Found stale response for: https://pypi.org/simple/mdurl/ 2025-09-07T09:15:14.0330553Z #43 0.649 DEBUG Sending revalidation request for: https://pypi.org/simple/mdurl/ 2025-09-07T09:15:14.0331210Z #43 0.650 DEBUG Found not-modified response for: https://pypi.org/simple/mdurl/ 2025-09-07T09:15:14.0331845Z #43 0.650 DEBUG Searching for a compatible version of mdurl (>=0.1, <1.dev0) 2025-09-07T09:15:14.0332475Z #43 0.650 DEBUG Selecting: mdurl==0.1.2 [compatible] (mdurl-0.1.2-py3-none-any.whl) 2025-09-07T09:15:14.0333986Z #43 0.651 DEBUG Found fresh response for: https://files.pythonhosted.org/packages/b3/38/89ba8ad64ae25be8de66a6d463314cf1eb366222074cfda9ee839c56a4b4/mdurl-0.1.2-py3-none-any.whl.metadata 2025-09-07T09:15:14.0343467Z #43 0.651 DEBUG Tried 140 versions: aiohappyeyeballs 1, aiohttp 1, aiosignal 1, annotated-types 1, anyio 1, astor 1, attrs 1, blake3 1, cachetools 1, cbor2 1, certifi 1, cffi 1, charset-normalizer 1, click 1, cloudpickle 1, compressed-tensors 1, cupy-cuda12x 1, depyf 1, dill 1, diskcache 1, distro 1, dnspython 1, einops 1, email-validator 1, fastapi 1, fastapi-cli 1, fastapi-cloud-cli 1, fastrlock 1, filelock 1, frozendict 1, frozenlist 1, fsspec 1, gguf 1, h11 1, hf-xet 1, httpcore 1, httptools 1, httpx 1, huggingface-hub 1, idna 1, interegular 1, jinja2 1, jiter 1, jsonschema 1, jsonschema-specifications 1, lark 1, llguidance 1, llvmlite 1, lm-format-enforcer 1, markdown-it-py 1, markupsafe 1, mdurl 1, mistral-common 1, mpmath 1, msgpack 1, msgspec 1, multidict 1, networkx 1, ninja 1, numba 1, numpy 1, nvidia-cublas-cu12 1, nvidia-cuda-cupti-cu12 1, nvidia-cuda-nvrtc-cu12 1, nvidia-cuda-runtime-cu12 1, nvidia-cudnn-cu12 1, nvidia-cufft-cu12 1, nvidia-cufile-cu12 1, nvidia-curand-cu12 1, nvidia-cusolver-cu12 1, nvidia-cusparse-cu12 1, nvidia-cusparselt-cu12 1, nvidia-nccl-cu12 1, nvidia-nvjitlink-cu12 1, nvidia-nvshmem-cu12 1, nvidia-nvtx-cu12 1, openai 1, openai-harmony 1, opencv-python-headless 1, outlines-core 1, packaging 1, partial-json-parser 1, pillow 1, prometheus-client 1, prometheus-fastapi-instrumentator 1, propcache 1, protobuf 1, psutil 1, py-cpuinfo 1, pybase64 1, pycountry 1, pycparser 1, pydantic 1, pydantic-core 1, pydantic-extra-types 1, pygments 1, python-dotenv 1, python-json-logger 1, python-multipart 1, pytorch-triton 1, pyyaml 1, pyzmq 1, ray 1, referencing 1, regex 1, requests 1, rich 1, rich-toolkit 1, rignore 1, rpds-py 1, safetensors 1, scipy 1, sentencepiece 1, sentry-sdk 1, setproctitle 1, setuptools 1, shellingham 1, six 1, sniffio 1, soundfile 1, soxr 1, starlette 1, sympy 1, tiktoken 1, tokenizers 1, torch 1, tqdm 1, transformers 1, triton 1, typer 1, typing-extensions 1, typing-inspection 1, urllib3 1, uvicorn 1, uvloop 1, vllm 1, watchfiles 1, websockets 1, xgrammar 1, yarl 1 2025-09-07T09:15:14.0354929Z #43 0.651 DEBUG marker environment resolution took 0.178s 2025-09-07T09:15:14.0355360Z #43 0.653 Resolved 140 packages in 189ms 2025-09-07T09:15:14.0356162Z #43 0.653 DEBUG Requirement already installed: pillow==11.3.0 (from file:///dist/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:15:14.0357161Z #43 0.653 DEBUG Registry requirement already cached: propcache==0.3.2 2025-09-07T09:15:14.0357741Z #43 0.654 DEBUG Registry requirement already cached: frozenlist==1.7.0 2025-09-07T09:15:14.0358325Z #43 0.654 DEBUG Registry requirement already cached: pydantic-core==2.33.2 2025-09-07T09:15:14.0358916Z #43 0.654 DEBUG Registry requirement already cached: fastapi==0.116.1 2025-09-07T09:15:14.0359462Z #43 0.654 DEBUG Identified uncached distribution: fastrlock==0.8.3 2025-09-07T09:15:14.0360016Z #43 0.654 DEBUG Registry requirement already cached: pycparser==2.22 2025-09-07T09:15:14.0360556Z #43 0.654 DEBUG Registry requirement already cached: typer==0.17.4 2025-09-07T09:15:14.0361081Z #43 0.654 DEBUG Requirement already installed: packaging==25.0 2025-09-07T09:15:14.0361624Z #43 0.654 DEBUG Registry requirement already cached: cachetools==6.2.0 2025-09-07T09:15:14.0362211Z #43 0.654 DEBUG Registry requirement already cached: annotated-types==0.7.0 2025-09-07T09:15:14.0363408Z #43 0.654 DEBUG Requirement already installed: pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:15:14.0365004Z #43 0.654 DEBUG Requirement already installed: nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:14.0366042Z #43 0.654 DEBUG Registry requirement already cached: pyzmq==27.0.2 2025-09-07T09:15:14.0366583Z #43 0.654 DEBUG Registry requirement already cached: numba==0.61.2 2025-09-07T09:15:14.0367125Z #43 0.654 DEBUG Registry requirement already cached: watchfiles==1.1.0 2025-09-07T09:15:14.0367707Z #43 0.654 DEBUG Registry requirement already cached: safetensors==0.6.2 2025-09-07T09:15:14.0368274Z #43 0.654 DEBUG Registry requirement already cached: starlette==0.47.3 2025-09-07T09:15:14.0369339Z #43 0.654 DEBUG Requirement already installed: nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:14.0370784Z #43 0.654 DEBUG Requirement already installed: nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0370988Z #43 0.654 DEBUG Registry requirement already cached: ninja==1.13.0 2025-09-07T09:15:14.0371643Z #43 0.654 DEBUG Requirement already installed: nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:14.0371926Z #43 0.654 DEBUG Registry requirement already cached: httpcore==1.0.9 2025-09-07T09:15:14.0372501Z #43 0.654 DEBUG Requirement already installed: nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T09:15:14.0372850Z #43 0.654 DEBUG Registry requirement already cached: aiohappyeyeballs==2.6.1 2025-09-07T09:15:14.0373245Z #43 0.654 DEBUG Registry requirement already cached: openai==1.106.1 2025-09-07T09:15:14.0373515Z #43 0.654 DEBUG Registry requirement already cached: charset-normalizer==3.4.3 2025-09-07T09:15:14.0373750Z #43 0.654 DEBUG Registry requirement already cached: referencing==0.36.2 2025-09-07T09:15:14.0373960Z #43 0.654 DEBUG Registry requirement already cached: uvicorn==0.35.0 2025-09-07T09:15:14.0374179Z #43 0.654 DEBUG Registry requirement already cached: hf-xet==1.1.9 2025-09-07T09:15:14.0374937Z #43 0.654 DEBUG Requirement already installed: nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T09:15:14.0375155Z #43 0.654 DEBUG Registry requirement already cached: pybase64==1.4.2 2025-09-07T09:15:14.0375824Z #43 0.654 DEBUG Requirement already installed: nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:14.0376049Z #43 0.654 DEBUG Registry requirement already cached: tiktoken==0.11.0 2025-09-07T09:15:14.0376300Z #43 0.654 DEBUG Registry requirement already cached: aiosignal==1.4.0 2025-09-07T09:15:14.0376533Z #43 0.654 DEBUG Registry requirement already cached: aiohttp==3.12.15 2025-09-07T09:15:14.0376735Z #43 0.654 DEBUG Registry requirement already cached: click==8.2.1 2025-09-07T09:15:14.0376966Z #43 0.654 DEBUG Registry requirement already cached: sentry-sdk==2.37.0 2025-09-07T09:15:14.0377191Z #43 0.654 DEBUG Registry requirement already cached: distro==1.9.0 2025-09-07T09:15:14.0377419Z #43 0.654 DEBUG Registry requirement already cached: tokenizers==0.22.0 2025-09-07T09:15:14.0378026Z #43 0.654 DEBUG Requirement already installed: nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T09:15:14.0378306Z #43 0.655 DEBUG Registry requirement already cached: typing-inspection==0.4.1 2025-09-07T09:15:14.0378508Z #43 0.655 DEBUG Registry requirement already cached: cbor2==5.7.0 2025-09-07T09:15:14.0378765Z #43 0.655 DEBUG Registry requirement already cached: certifi==2025.8.3 2025-09-07T09:15:14.0379003Z #43 0.655 DEBUG Registry requirement already cached: py-cpuinfo==9.0.0 2025-09-07T09:15:14.0379221Z #43 0.655 DEBUG Registry requirement already cached: requests==2.32.5 2025-09-07T09:15:14.0379428Z #43 0.655 DEBUG Registry requirement already cached: urllib3==2.5.0 2025-09-07T09:15:14.0379645Z #43 0.655 DEBUG Registry requirement already cached: msgspec==0.19.0 2025-09-07T09:15:14.0379858Z #43 0.655 DEBUG Registry requirement already cached: tqdm==4.67.1 2025-09-07T09:15:14.0380059Z #43 0.655 DEBUG Registry requirement already cached: yarl==1.20.1 2025-09-07T09:15:14.0380305Z #43 0.655 DEBUG Registry requirement already cached: openai-harmony==0.0.4 2025-09-07T09:15:14.0380514Z #43 0.655 DEBUG Registry requirement already cached: h11==0.16.0 2025-09-07T09:15:14.0380746Z #43 0.655 DEBUG Registry requirement already cached: shellingham==1.5.4 2025-09-07T09:15:14.0381494Z #43 0.655 DEBUG Requirement already installed: nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:14.0381814Z #43 0.655 DEBUG Registry requirement already cached: opencv-python-headless==4.12.0.88 2025-09-07T09:15:14.0382326Z #43 0.655 DEBUG Requirement already installed: typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) 2025-09-07T09:15:14.0382574Z #43 0.655 DEBUG Registry requirement already cached: transformers==4.56.1 2025-09-07T09:15:14.0382985Z #43 0.655 DEBUG Requirement already installed: mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) 2025-09-07T09:15:14.0383229Z #43 0.655 DEBUG Registry requirement already cached: astor==0.8.1 2025-09-07T09:15:14.0383498Z #43 0.655 DEBUG Registry requirement already cached: lm-format-enforcer==0.11.3 2025-09-07T09:15:14.0383730Z #43 0.655 DEBUG Registry requirement already cached: pygments==2.19.2 2025-09-07T09:15:14.0383977Z #43 0.655 DEBUG Registry requirement already cached: outlines-core==0.2.10 2025-09-07T09:15:14.0384184Z #43 0.655 DEBUG Registry requirement already cached: anyio==4.10.0 2025-09-07T09:15:14.0384413Z #43 0.655 DEBUG Registry requirement already cached: interegular==0.3.3 2025-09-07T09:15:14.0384631Z #43 0.655 DEBUG Registry requirement already cached: blake3==1.0.5 2025-09-07T09:15:14.0384924Z #43 0.655 DEBUG Registry requirement already cached: fastapi-cloud-cli==0.1.5 2025-09-07T09:15:14.0385418Z #43 0.655 DEBUG Requirement already installed: jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) 2025-09-07T09:15:14.0385649Z #43 0.655 DEBUG Registry requirement already cached: protobuf==6.32.0 2025-09-07T09:15:14.0385846Z #43 0.655 DEBUG Registry requirement already cached: scipy==1.16.1 2025-09-07T09:15:14.0386044Z #43 0.655 DEBUG Registry requirement already cached: pyyaml==6.0.2 2025-09-07T09:15:14.0386252Z #43 0.655 DEBUG Registry requirement already cached: mdurl==0.1.2 2025-09-07T09:15:14.0386493Z #43 0.655 DEBUG Registry requirement already cached: xgrammar==0.1.23 2025-09-07T09:15:14.0386714Z #43 0.655 DEBUG Identified uncached distribution: cupy-cuda12x==13.6.0 2025-09-07T09:15:14.0386965Z #43 0.655 DEBUG Registry requirement already cached: markdown-it-py==4.0.0 2025-09-07T09:15:14.0387173Z #43 0.656 DEBUG Registry requirement already cached: rpds-py==0.27.1 2025-09-07T09:15:14.0387390Z #43 0.656 DEBUG Registry requirement already cached: multidict==6.6.4 2025-09-07T09:15:14.0387689Z #43 0.656 DEBUG Registry requirement already cached: partial-json-parser==0.2.1.1.post6 2025-09-07T09:15:14.0388097Z #43 0.656 DEBUG Requirement already installed: networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) 2025-09-07T09:15:14.0388320Z #43 0.656 DEBUG Registry requirement already cached: websockets==15.0.1 2025-09-07T09:15:14.0388555Z #43 0.656 DEBUG Registry requirement already cached: python-dotenv==1.1.1 2025-09-07T09:15:14.0389001Z #43 0.656 DEBUG Requirement already installed: fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) 2025-09-07T09:15:14.0389211Z #43 0.656 DEBUG Registry requirement already cached: sniffio==1.3.1 2025-09-07T09:15:14.0389425Z #43 0.656 DEBUG Registry requirement already cached: httptools==0.6.4 2025-09-07T09:15:14.0389696Z #43 0.656 DEBUG Registry requirement already cached: prometheus-client==0.22.1 2025-09-07T09:15:14.0720529Z #43 0.656 DEBUG Registry requirement already cached: lark==1.2.2 2025-09-07T09:15:14.0721276Z #43 0.656 DEBUG Requirement already installed: nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T09:15:14.0721699Z #43 0.656 DEBUG Registry requirement already cached: prometheus-fastapi-instrumentator==7.1.0 2025-09-07T09:15:14.0721984Z #43 0.656 DEBUG Registry requirement already cached: sentencepiece==0.2.1 2025-09-07T09:15:14.0722252Z #43 0.656 DEBUG Registry requirement already cached: dill==0.4.0 2025-09-07T09:15:14.0722525Z #43 0.656 DEBUG Registry requirement already cached: python-json-logger==3.3.0 2025-09-07T09:15:14.0722748Z #43 0.656 DEBUG Registry requirement already cached: frozendict==2.4.6 2025-09-07T09:15:14.0722977Z #43 0.656 DEBUG Registry requirement already cached: pydantic==2.11.7 2025-09-07T09:15:14.0723181Z #43 0.656 DEBUG Registry requirement already cached: einops==0.8.1 2025-09-07T09:15:14.0723822Z #43 0.656 DEBUG Requirement already installed: nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:14.0724076Z #43 0.656 DEBUG Registry requirement already cached: fastapi-cli==0.0.10 2025-09-07T09:15:14.0724507Z #43 0.656 DEBUG Registry requirement already cached: mistral-common==1.8.4 2025-09-07T09:15:14.0725112Z #43 0.656 DEBUG Requirement already installed: markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:14.0725328Z #43 0.656 DEBUG Registry requirement already cached: httpx==0.28.1 2025-09-07T09:15:14.0726008Z #43 0.656 DEBUG Requirement already installed: nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:14.0726207Z #43 0.656 DEBUG Registry requirement already cached: rich==14.1.0 2025-09-07T09:15:14.0726403Z #43 0.656 DEBUG Registry requirement already cached: cffi==1.17.1 2025-09-07T09:15:14.0726721Z #43 0.656 DEBUG Registry requirement already cached: email-validator==2.3.0 2025-09-07T09:15:14.0727104Z #43 0.656 DEBUG Requirement already installed: sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) 2025-09-07T09:15:14.0727315Z #43 0.656 DEBUG Registry requirement already cached: rignore==0.6.4 2025-09-07T09:15:14.0727529Z #43 0.656 DEBUG Registry requirement already cached: jiter==0.10.0 2025-09-07T09:15:14.0727803Z #43 0.656 DEBUG Registry requirement already cached: pydantic-extra-types==2.10.5 2025-09-07T09:15:14.0728030Z #43 0.656 DEBUG Registry requirement already cached: llguidance==0.7.30 2025-09-07T09:15:14.0728302Z #43 0.656 DEBUG Registry requirement already cached: triton==3.4.0 2025-09-07T09:15:14.0728506Z #43 0.657 DEBUG Registry requirement already cached: psutil==7.0.0 2025-09-07T09:15:14.0728728Z #43 0.657 DEBUG Registry requirement already cached: uvloop==0.21.0 2025-09-07T09:15:14.0729062Z #43 0.657 DEBUG Registry requirement already cached: regex==2025.9.1 2025-09-07T09:15:14.0729252Z #43 0.657 DEBUG Identified uncached distribution: ray==2.49.1 2025-09-07T09:15:14.0729449Z #43 0.657 DEBUG Registry requirement already cached: gguf==0.17.1 2025-09-07T09:15:14.0729670Z #43 0.657 DEBUG Registry requirement already cached: soxr==0.5.0.post1 2025-09-07T09:15:14.0729879Z #43 0.657 DEBUG Registry requirement already cached: idna==3.10 2025-09-07T09:15:14.0730185Z #43 0.657 DEBUG Registry requirement already cached: jsonschema-specifications==2025.4.1 2025-09-07T09:15:14.0730401Z #43 0.657 DEBUG Registry requirement already cached: diskcache==5.6.3 2025-09-07T09:15:14.0731111Z #43 0.657 DEBUG Requirement already installed: nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T09:15:14.0731317Z #43 0.657 DEBUG Registry requirement already cached: depyf==0.19.0 2025-09-07T09:15:14.0731542Z #43 0.657 DEBUG Registry requirement already cached: cloudpickle==3.1.1 2025-09-07T09:15:14.0731774Z #43 0.657 DEBUG Registry requirement already cached: llvmlite==0.44.0 2025-09-07T09:15:14.0732038Z #43 0.657 DEBUG Registry requirement already cached: compressed-tensors==0.11.0 2025-09-07T09:15:14.0732790Z #43 0.657 DEBUG Requirement already installed: nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:14.0733418Z #43 0.657 DEBUG Requirement already installed: setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) 2025-09-07T09:15:14.0733646Z #43 0.657 DEBUG Registry requirement already cached: dnspython==2.7.0 2025-09-07T09:15:14.0733879Z #43 0.657 DEBUG Registry requirement already cached: jsonschema==4.25.1 2025-09-07T09:15:14.0734128Z #43 0.657 DEBUG Registry requirement already cached: setproctitle==1.3.7 2025-09-07T09:15:14.0734382Z #43 0.657 DEBUG Registry requirement already cached: huggingface-hub==0.34.4 2025-09-07T09:15:14.0734547Z #43 0.658 DEBUG Requirement installed, but mismatched: 2025-09-07T09:15:14.0737803Z #43 0.658 Installed: Url(InstalledDirectUrlDist { name: PackageName("numpy"), version: "2.3.2", direct_url: ArchiveUrl { url: "file:///dist/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", archive_info: ArchiveInfo { hash: None, hashes: None }, subdirectory: None }, url: DisplaySafeUrl { scheme: "file", cannot_be_a_base: false, username: "", password: None, host: None, port: None, path: "/dist/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", query: None, fragment: None }, editable: false, path: "/opt/python/cp312-cp312/lib/python3.12/site-packages/numpy-2.3.2.dist-info", cache_info: Some(CacheInfo { timestamp: Some(Timestamp(SystemTime { tv_sec: 1757226019, tv_nsec: 810478714 })), commit: None, tags: None, env: {}, directories: {} }) }) 2025-09-07T09:15:14.0739539Z #43 0.658 Requested: Registry { specifier: VersionSpecifiers([VersionSpecifier { operator: Equal, version: "2.2.6" }]), index: Some(IndexMetadata { url: Pypi(VerbatimUrl { url: DisplaySafeUrl { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("pypi.org")), port: None, path: "/simple", query: None, fragment: None }, given: None }), format: Simple }), conflict: None } 2025-09-07T09:15:14.0739817Z #43 0.659 DEBUG Registry requirement already cached: numpy==2.2.6 2025-09-07T09:15:14.0740022Z #43 0.659 DEBUG Identified uncached distribution: msgpack==1.1.1 2025-09-07T09:15:14.0740228Z #43 0.659 DEBUG Registry requirement already cached: attrs==25.3.0 2025-09-07T09:15:14.0741008Z #43 0.659 DEBUG Requirement already installed: nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T09:15:14.0741212Z #43 0.659 DEBUG Registry requirement already cached: six==1.17.0 2025-09-07T09:15:14.0741442Z #43 0.659 DEBUG Registry requirement already cached: pycountry==24.6.1 2025-09-07T09:15:14.0741881Z #43 0.659 DEBUG Requirement already installed: filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) 2025-09-07T09:15:14.0742128Z #43 0.659 DEBUG Registry requirement already cached: rich-toolkit==0.15.1 2025-09-07T09:15:14.0742356Z #43 0.659 DEBUG Registry requirement already cached: soundfile==0.13.1 2025-09-07T09:15:14.0742996Z #43 0.659 DEBUG Requirement already installed: torch==2.9.0.dev20250906+cu128 (from file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T09:15:14.0743641Z #43 0.659 DEBUG Identified uncached distribution: vllm @ file:///wheels/vllm/vllm-0.10.2rc2.dev125+g4172235ab.d20250907-cp38-abi3-linux_x86_64.whl 2025-09-07T09:15:14.0743915Z #43 0.659 DEBUG Registry requirement already cached: python-multipart==0.0.20 2025-09-07T09:15:14.0744063Z #43 0.659 DEBUG Unnecessary package: build==1.3.0 2025-09-07T09:15:14.0744243Z #43 0.659 DEBUG Unnecessary package: opt-einsum==3.4.0 2025-09-07T09:15:14.0744507Z #43 0.659 DEBUG Preserving seed package: pip==25.2 2025-09-07T09:15:14.0744691Z #43 0.659 DEBUG Unnecessary package: pyproject-hooks==1.2.0 2025-09-07T09:15:14.0745340Z #43 0.659 DEBUG Unnecessary package: torchaudio==2.8.0.dev20250906+cu128 (from file:///dist/torchaudio-2.8.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T09:15:14.0745988Z #43 0.659 DEBUG Unnecessary package: torchvision==0.24.0.dev20250906+cu128 (from file:///dist/torchvision-0.24.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T09:15:14.0746136Z #43 0.659 DEBUG Preserving seed package: uv==0.8.4 2025-09-07T09:15:14.0746292Z #43 0.659 DEBUG Unnecessary package: wheel==0.45.1 2025-09-07T09:15:14.0747219Z #43 0.662 DEBUG No cache entry for: https://files.pythonhosted.org/packages/e0/95/d7e1295141e7d530674a3cc567e13ed0eb6b81524cb122d797ed996b5bea/cupy_cuda12x-13.6.0-cp312-cp312-manylinux2014_x86_64.whl 2025-09-07T09:15:14.0748070Z #43 0.662 DEBUG No cache entry for: https://files.pythonhosted.org/packages/00/02/c81260c0f94bd34a1442ea488bdd433dfc9e6ed6211c9a59bc4157b8e00e/ray-2.49.1-cp312-cp312-manylinux2014_x86_64.whl 2025-09-07T09:15:14.0749058Z #43 0.662 DEBUG No cache entry for: https://files.pythonhosted.org/packages/4d/ec/fd869e2567cc9c01278a736cfd1697941ba0d4b81a43e0aa2e8d71dab208/msgpack-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 2025-09-07T09:15:14.0750174Z #43 0.662 DEBUG No cache entry for: https://files.pythonhosted.org/packages/80/07/cdecb7aa976f34328372f1c4efd6c9dc1b039b3cc8d3f38787d640009a25/fastrlock-0.8.3-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:15:14.0750311Z #43 0.667 Downloading cupy-cuda12x (107.7MiB) 2025-09-07T09:15:14.0750442Z #43 0.667 Downloading ray (66.9MiB) 2025-09-07T09:15:15.4818978Z #43 2.228 Downloading cupy-cuda12x 2025-09-07T09:15:15.6575420Z #43 2.253 Downloading ray 2025-09-07T09:15:16.4823059Z #43 3.228 Prepared 5 packages in 2.56s 2025-09-07T09:15:16.6700261Z #43 3.265 DEBUG Uninstalled numpy (907 files, 90 directories) 2025-09-07T09:15:16.6700787Z #43 3.265 Uninstalled 1 package in 36ms 2025-09-07T09:15:17.2498238Z #43 3.996 Installed 112 packages in 730ms 2025-09-07T09:15:17.4102775Z #43 3.996 + aiohappyeyeballs==2.6.1 2025-09-07T09:15:17.4103685Z #43 3.996 + aiohttp==3.12.15 2025-09-07T09:15:17.4104096Z #43 3.996 + aiosignal==1.4.0 2025-09-07T09:15:17.4104998Z #43 3.996 + annotated-types==0.7.0 2025-09-07T09:15:17.4105581Z #43 3.996 + anyio==4.10.0 2025-09-07T09:15:17.4106032Z #43 3.996 + astor==0.8.1 2025-09-07T09:15:17.4106376Z #43 3.996 + attrs==25.3.0 2025-09-07T09:15:17.4106654Z #43 3.996 + blake3==1.0.5 2025-09-07T09:15:17.4106974Z #43 3.996 + cachetools==6.2.0 2025-09-07T09:15:17.4109762Z #43 3.996 + cbor2==5.7.0 2025-09-07T09:15:17.4110353Z #43 3.996 + certifi==2025.8.3 2025-09-07T09:15:17.4110808Z #43 3.996 + cffi==1.17.1 2025-09-07T09:15:17.4111331Z #43 3.996 + charset-normalizer==3.4.3 2025-09-07T09:15:17.4111931Z #43 3.996 + click==8.2.1 2025-09-07T09:15:17.4112355Z #43 3.996 + cloudpickle==3.1.1 2025-09-07T09:15:17.4112875Z #43 3.996 + compressed-tensors==0.11.0 2025-09-07T09:15:17.4113412Z #43 3.996 + cupy-cuda12x==13.6.0 2025-09-07T09:15:17.4113903Z #43 3.997 + depyf==0.19.0 2025-09-07T09:15:17.4114356Z #43 3.997 + dill==0.4.0 2025-09-07T09:15:17.4114867Z #43 3.997 + diskcache==5.6.3 2025-09-07T09:15:17.4115364Z #43 3.997 + distro==1.9.0 2025-09-07T09:15:17.4115815Z #43 3.997 + dnspython==2.7.0 2025-09-07T09:15:17.4116255Z #43 3.997 + einops==0.8.1 2025-09-07T09:15:17.4116675Z #43 3.997 + email-validator==2.3.0 2025-09-07T09:15:17.4117099Z #43 3.997 + fastapi==0.116.1 2025-09-07T09:15:17.4117498Z #43 3.997 + fastapi-cli==0.0.10 2025-09-07T09:15:17.4118067Z #43 3.997 + fastapi-cloud-cli==0.1.5 2025-09-07T09:15:17.4118527Z #43 3.997 + fastrlock==0.8.3 2025-09-07T09:15:17.4118940Z #43 3.997 + frozendict==2.4.6 2025-09-07T09:15:17.4119327Z #43 3.997 + frozenlist==1.7.0 2025-09-07T09:15:17.4119716Z #43 3.997 + gguf==0.17.1 2025-09-07T09:15:17.4120067Z #43 3.997 + h11==0.16.0 2025-09-07T09:15:17.4120441Z #43 3.997 + hf-xet==1.1.9 2025-09-07T09:15:17.4120809Z #43 3.997 + httpcore==1.0.9 2025-09-07T09:15:17.4121199Z #43 3.997 + httptools==0.6.4 2025-09-07T09:15:17.4121586Z #43 3.997 + httpx==0.28.1 2025-09-07T09:15:17.4121988Z #43 3.997 + huggingface-hub==0.34.4 2025-09-07T09:15:17.4122408Z #43 3.997 + idna==3.10 2025-09-07T09:15:17.4122794Z #43 3.997 + interegular==0.3.3 2025-09-07T09:15:17.4123205Z #43 3.997 + jiter==0.10.0 2025-09-07T09:15:17.4123628Z #43 3.997 + jsonschema==4.25.1 2025-09-07T09:15:17.4124110Z #43 3.997 + jsonschema-specifications==2025.4.1 2025-09-07T09:15:17.4124667Z #43 3.997 + lark==1.2.2 2025-09-07T09:15:17.4125138Z #43 3.998 + llguidance==0.7.30 2025-09-07T09:15:17.4125471Z #43 3.998 + llvmlite==0.44.0 2025-09-07T09:15:17.4125806Z #43 3.998 + lm-format-enforcer==0.11.3 2025-09-07T09:15:17.4126156Z #43 3.998 + markdown-it-py==4.0.0 2025-09-07T09:15:17.4126486Z #43 3.998 + mdurl==0.1.2 2025-09-07T09:15:17.4126769Z #43 3.998 + mistral-common==1.8.4 2025-09-07T09:15:17.4127097Z #43 3.998 + msgpack==1.1.1 2025-09-07T09:15:17.4127400Z #43 3.998 + msgspec==0.19.0 2025-09-07T09:15:17.4127691Z #43 3.998 + multidict==6.6.4 2025-09-07T09:15:17.4128000Z #43 3.998 + ninja==1.13.0 2025-09-07T09:15:17.4128389Z #43 3.998 + numba==0.61.2 2025-09-07T09:15:17.4128998Z #43 3.998 - numpy==2.3.2 (from file:///dist/numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:15:17.4129616Z #43 3.998 + numpy==2.2.6 2025-09-07T09:15:17.4129910Z #43 3.998 + openai==1.106.1 2025-09-07T09:15:17.4130210Z #43 3.998 + openai-harmony==0.0.4 2025-09-07T09:15:17.4130583Z #43 3.998 + opencv-python-headless==4.12.0.88 2025-09-07T09:15:17.4130981Z #43 3.998 + outlines-core==0.2.10 2025-09-07T09:15:17.4131342Z #43 3.998 + partial-json-parser==0.2.1.1.post6 2025-09-07T09:15:17.4131737Z #43 3.998 + prometheus-client==0.22.1 2025-09-07T09:15:17.4132136Z #43 3.998 + prometheus-fastapi-instrumentator==7.1.0 2025-09-07T09:15:17.4132684Z #43 3.998 + propcache==0.3.2 2025-09-07T09:15:17.4133075Z #43 3.998 + protobuf==6.32.0 2025-09-07T09:15:17.4133384Z #43 3.998 + psutil==7.0.0 2025-09-07T09:15:17.4133675Z #43 3.998 + py-cpuinfo==9.0.0 2025-09-07T09:15:17.4133994Z #43 3.998 + pybase64==1.4.2 2025-09-07T09:15:17.4134293Z #43 3.998 + pycountry==24.6.1 2025-09-07T09:15:17.4134608Z #43 3.998 + pycparser==2.22 2025-09-07T09:15:17.4134919Z #43 3.998 + pydantic==2.11.7 2025-09-07T09:15:17.4135223Z #43 3.998 + pydantic-core==2.33.2 2025-09-07T09:15:17.4135579Z #43 3.998 + pydantic-extra-types==2.10.5 2025-09-07T09:15:17.4135929Z #43 3.998 + pygments==2.19.2 2025-09-07T09:15:17.4136246Z #43 3.998 + python-dotenv==1.1.1 2025-09-07T09:15:17.4136618Z #43 3.999 + python-json-logger==3.3.0 2025-09-07T09:15:17.4136984Z #43 3.999 + python-multipart==0.0.20 2025-09-07T09:15:17.4137310Z #43 3.999 + pyyaml==6.0.2 2025-09-07T09:15:17.4137601Z #43 3.999 + pyzmq==27.0.2 2025-09-07T09:15:17.4137871Z #43 3.999 + ray==2.49.1 2025-09-07T09:15:17.4138161Z #43 3.999 + referencing==0.36.2 2025-09-07T09:15:17.4138484Z #43 3.999 + regex==2025.9.1 2025-09-07T09:15:17.4138773Z #43 3.999 + requests==2.32.5 2025-09-07T09:15:17.4139077Z #43 3.999 + rich==14.1.0 2025-09-07T09:15:17.4139356Z #43 3.999 + rich-toolkit==0.15.1 2025-09-07T09:15:17.4139683Z #43 3.999 + rignore==0.6.4 2025-09-07T09:15:17.4139971Z #43 3.999 + rpds-py==0.27.1 2025-09-07T09:15:17.4140277Z #43 3.999 + safetensors==0.6.2 2025-09-07T09:15:17.4140575Z #43 3.999 + scipy==1.16.1 2025-09-07T09:15:17.4140879Z #43 3.999 + sentencepiece==0.2.1 2025-09-07T09:15:17.4141192Z #43 3.999 + sentry-sdk==2.37.0 2025-09-07T09:15:17.4141507Z #43 3.999 + setproctitle==1.3.7 2025-09-07T09:15:17.4141866Z #43 3.999 + shellingham==1.5.4 2025-09-07T09:15:17.4142159Z #43 3.999 + six==1.17.0 2025-09-07T09:15:17.4142441Z #43 3.999 + sniffio==1.3.1 2025-09-07T09:15:17.4142726Z #43 3.999 + soundfile==0.13.1 2025-09-07T09:15:17.4143030Z #43 3.999 + soxr==0.5.0.post1 2025-09-07T09:15:17.4143321Z #43 3.999 + starlette==0.47.3 2025-09-07T09:15:17.4143627Z #43 3.999 + tiktoken==0.11.0 2025-09-07T09:15:17.4143958Z #43 4.000 + tokenizers==0.22.0 2025-09-07T09:15:17.4144282Z #43 4.000 + tqdm==4.67.1 2025-09-07T09:15:17.4144703Z #43 4.000 + transformers==4.56.1 2025-09-07T09:15:17.4145019Z #43 4.000 + triton==3.4.0 2025-09-07T09:15:17.4145302Z #43 4.000 + typer==0.17.4 2025-09-07T09:15:17.4145591Z #43 4.000 + typing-inspection==0.4.1 2025-09-07T09:15:17.4145923Z #43 4.000 + urllib3==2.5.0 2025-09-07T09:15:17.4146199Z #43 4.000 + uvicorn==0.35.0 2025-09-07T09:15:17.4146491Z #43 4.000 + uvloop==0.21.0 2025-09-07T09:15:17.4147232Z #43 4.000 + vllm==0.10.2rc2.dev125+g4172235ab.d20250907 (from file:///wheels/vllm/vllm-0.10.2rc2.dev125+g4172235ab.d20250907-cp38-abi3-linux_x86_64.whl) 2025-09-07T09:15:17.4148022Z #43 4.000 + watchfiles==1.1.0 2025-09-07T09:15:17.4148313Z #43 4.000 + websockets==15.0.1 2025-09-07T09:15:17.4148617Z #43 4.000 + xgrammar==0.1.23 2025-09-07T09:15:17.4148910Z #43 4.000 + yarl==1.20.1 2025-09-07T09:15:17.4149268Z #43 4.004 DEBUG Released lock at `/tmp/uv-281d6a3886c08524.lock` 2025-09-07T09:15:53.5796558Z #43 DONE 40.3s 2025-09-07T09:15:53.7332732Z 2025-09-07T09:15:53.7334473Z #44 [vllm-base 12/18] RUN --mount=type=cache,target=/root/.cache/uv uv pip install --system /wheels/xformers/*.whl --verbose 2025-09-07T09:15:54.1176824Z #44 0.536 DEBUG uv 0.8.4 2025-09-07T09:15:54.2929739Z #44 0.536 DEBUG Searching for default Python interpreter in managed installations or search path 2025-09-07T09:15:54.2932118Z #44 0.536 DEBUG Searching for managed installations at `/root/.local/share/uv/python` 2025-09-07T09:15:54.2933493Z #44 0.538 DEBUG Found `cpython-3.12.11-linux-x86_64-gnu` at `/opt/python/cp312-cp312/bin/python` (first executable in the search path) 2025-09-07T09:15:54.2934369Z #44 0.538 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T09:15:54.2934918Z #44 0.539 DEBUG Acquired lock for `/opt/python/cp312-cp312` 2025-09-07T09:15:54.2935806Z #44 0.544 DEBUG At least one requirement is not satisfied: file:///wheels/xformers/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl 2025-09-07T09:15:54.2936801Z #44 0.544 DEBUG Using request timeout of 500s 2025-09-07T09:15:54.2937271Z #44 0.550 DEBUG Solving with installed Python version: 3.12.11 2025-09-07T09:15:54.2937785Z #44 0.550 DEBUG Solving with target Python version: >=3.12.11 2025-09-07T09:15:54.2938271Z #44 0.550 DEBUG Adding direct dependency: xformers* 2025-09-07T09:15:54.2939169Z #44 0.550 DEBUG Searching for a compatible version of xformers @ file:///wheels/xformers/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl (*) 2025-09-07T09:15:54.2940303Z #44 0.550 DEBUG Adding transitive dependency for xformers==0.0.33+5d4b92a5.d20250907: torch>=2.8 2025-09-07T09:15:54.2941096Z #44 0.550 DEBUG Adding transitive dependency for xformers==0.0.33+5d4b92a5.d20250907: numpy* 2025-09-07T09:15:54.2941801Z #44 0.551 DEBUG Found fresh response for: https://pypi.org/simple/torch/ 2025-09-07T09:15:54.2942400Z #44 0.551 DEBUG Searching for a compatible version of torch (>=2.8) 2025-09-07T09:15:54.2943469Z #44 0.551 DEBUG Found installed version of torch==2.9.0.dev20250906+cu128 (from file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) that satisfies >=2.8 2025-09-07T09:15:54.2945109Z #44 0.551 DEBUG Found installed version of torch==2.9.0.dev20250906+cu128 (from file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) that satisfies >=2.8 2025-09-07T09:15:54.2946177Z #44 0.551 DEBUG Selecting: torch==2.9.0.dev20250906+cu128 [installed] (installed) 2025-09-07T09:15:54.2946904Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: filelock* 2025-09-07T09:15:54.2947706Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: typing-extensions>=4.10.0 2025-09-07T09:15:54.2948643Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: setuptools{python_full_version >= '3.12'}* 2025-09-07T09:15:54.2949515Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: sympy>=1.13.3 2025-09-07T09:15:54.2950294Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: networkx>=2.5.1 2025-09-07T09:15:54.2951031Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: jinja2* 2025-09-07T09:15:54.2951766Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: fsspec>=0.8.5 2025-09-07T09:15:54.2952896Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.93, <12.8.93+ 2025-09-07T09:15:54.2954383Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.90, <12.8.90+ 2025-09-07T09:15:54.2955872Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.90, <12.8.90+ 2025-09-07T09:15:54.2957328Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=9.10.2.21, <9.10.2.21+ 2025-09-07T09:15:54.2958825Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.4.1, <12.8.4.1+ 2025-09-07T09:15:54.2960276Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=11.3.3.83, <11.3.3.83+ 2025-09-07T09:15:54.2961734Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=10.3.9.90, <10.3.9.90+ 2025-09-07T09:15:54.2963182Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=11.7.3.90, <11.7.3.90+ 2025-09-07T09:15:54.2964698Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.5.8.93, <12.5.8.93+ 2025-09-07T09:15:54.2966169Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=0.7.1, <0.7.1+ 2025-09-07T09:15:54.2967614Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=2.27.5, <2.27.5+ 2025-09-07T09:15:54.2969035Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=3.3.20, <3.3.20+ 2025-09-07T09:15:54.2970443Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.90, <12.8.90+ 2025-09-07T09:15:54.2971882Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.93, <12.8.93+ 2025-09-07T09:15:54.2973645Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=1.13.1.3, <1.13.1.3+ 2025-09-07T09:15:54.2975140Z #44 0.551 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: pytorch-triton{platform_machine == 'x86_64' and sys_platform == 'linux'}==3.4.0+gitf7888497 2025-09-07T09:15:54.2976193Z #44 0.553 DEBUG Found fresh response for: https://pypi.org/simple/filelock/ 2025-09-07T09:15:54.2976881Z #44 0.553 DEBUG Found fresh response for: https://pypi.org/simple/typing-extensions/ 2025-09-07T09:15:54.2977915Z #44 0.553 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.10.0 2025-09-07T09:15:54.2979116Z #44 0.553 DEBUG Found installed version of filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) that satisfies * 2025-09-07T09:15:54.2979953Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/sympy/ 2025-09-07T09:15:54.2980583Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/networkx/ 2025-09-07T09:15:54.2981210Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/jinja2/ 2025-09-07T09:15:54.2981813Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/fsspec/ 2025-09-07T09:15:54.2982515Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T09:15:54.2983282Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T09:15:54.2984333Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.93, <12.8.93+) 2025-09-07T09:15:54.2985986Z #44 0.554 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies >=12.8.93, <12.8.93+ 2025-09-07T09:15:54.2987225Z #44 0.554 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.8.93 [installed] (installed) 2025-09-07T09:15:54.2987940Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T09:15:54.2988800Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cuda-nvrtc-cu12==12.8.93: nvidia-cuda-nvrtc-cu12==12.8.93 2025-09-07T09:15:54.2990000Z #44 0.554 DEBUG Adding transitive dependency for nvidia-cuda-nvrtc-cu12==12.8.93: nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.93 2025-09-07T09:15:54.2991070Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T09:15:54.2991776Z #44 0.554 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12 (==12.8.93) 2025-09-07T09:15:54.2993607Z #44 0.554 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T09:15:54.2994831Z #44 0.554 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.8.93 [installed] (installed) 2025-09-07T09:15:54.2995544Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T09:15:54.2996287Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T09:15:54.2997081Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T09:15:54.2997826Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T09:15:54.2998563Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T09:15:54.2999327Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T09:15:54.3000071Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T09:15:54.3000790Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T09:15:54.3001519Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T09:15:54.3002244Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T09:15:54.3003057Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T09:15:54.3003777Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T09:15:54.3004659Z #44 0.554 DEBUG Found installed version of sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) that satisfies >=1.13.3 2025-09-07T09:15:54.3005631Z #44 0.554 DEBUG Found fresh response for: https://pypi.org/simple/setuptools/ 2025-09-07T09:15:54.3006483Z #44 0.554 DEBUG Found installed version of networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) that satisfies >=2.5.1 2025-09-07T09:15:54.3007522Z #44 0.555 DEBUG Found installed version of jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) that satisfies * 2025-09-07T09:15:54.3008561Z #44 0.555 DEBUG Found installed version of fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) that satisfies >=0.8.5 2025-09-07T09:15:54.3009952Z #44 0.555 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T09:15:54.3011101Z #44 0.555 DEBUG Found fresh response for: https://pypi.org/simple/numpy/ 2025-09-07T09:15:54.3012018Z #44 0.555 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.93) 2025-09-07T09:15:54.3013755Z #44 0.555 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T09:15:54.3015035Z #44 0.555 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.8.93 [installed] (installed) 2025-09-07T09:15:54.3016043Z #44 0.555 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.90, <12.8.90+) 2025-09-07T09:15:54.3017647Z #44 0.555 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.8.90, <12.8.90+ 2025-09-07T09:15:54.3018935Z #44 0.555 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.8.90 [installed] (installed) 2025-09-07T09:15:54.3019812Z #44 0.555 DEBUG Adding transitive dependency for nvidia-cuda-runtime-cu12==12.8.90: nvidia-cuda-runtime-cu12==12.8.90 2025-09-07T09:15:54.3021151Z #44 0.555 DEBUG Adding transitive dependency for nvidia-cuda-runtime-cu12==12.8.90: nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.90 2025-09-07T09:15:54.3022305Z #44 0.555 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12 (==12.8.90) 2025-09-07T09:15:54.3023579Z #44 0.555 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:15:54.3024993Z #44 0.555 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.8.90 [installed] (installed) 2025-09-07T09:15:54.3025624Z #44 0.556 DEBUG Found installed version of numpy==2.2.6 that satisfies * 2025-09-07T09:15:54.3026790Z #44 0.556 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:15:54.3028281Z #44 0.556 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.90) 2025-09-07T09:15:54.3029756Z #44 0.556 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:15:54.3030965Z #44 0.556 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.8.90 [installed] (installed) 2025-09-07T09:15:54.3031987Z #44 0.556 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.90, <12.8.90+) 2025-09-07T09:15:54.3033487Z #44 0.556 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.8.90, <12.8.90+ 2025-09-07T09:15:54.3034701Z #44 0.556 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.8.90 [installed] (installed) 2025-09-07T09:15:54.3035512Z #44 0.556 DEBUG Adding transitive dependency for nvidia-cuda-cupti-cu12==12.8.90: nvidia-cuda-cupti-cu12==12.8.90 2025-09-07T09:15:54.3036717Z #44 0.556 DEBUG Adding transitive dependency for nvidia-cuda-cupti-cu12==12.8.90: nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.90 2025-09-07T09:15:54.3037790Z #44 0.556 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12 (==12.8.90) 2025-09-07T09:15:54.3038987Z #44 0.556 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:15:54.3040160Z #44 0.556 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.8.90 [installed] (installed) 2025-09-07T09:15:54.3041337Z #44 0.556 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:15:54.3042777Z #44 0.556 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.90) 2025-09-07T09:15:54.3044258Z #44 0.556 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:15:54.3045433Z #44 0.556 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.8.90 [installed] (installed) 2025-09-07T09:15:54.3046384Z #44 0.556 DEBUG Searching for a compatible version of nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=9.10.2.21, <9.10.2.21+) 2025-09-07T09:15:54.3047769Z #44 0.556 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=9.10.2.21, <9.10.2.21+ 2025-09-07T09:15:54.3048874Z #44 0.556 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T09:15:54.3049653Z #44 0.556 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cudnn-cu12==9.10.2.21 2025-09-07T09:15:54.3050779Z #44 0.556 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==9.10.2.21 2025-09-07T09:15:54.3051776Z #44 0.556 DEBUG Searching for a compatible version of nvidia-cudnn-cu12 (==9.10.2.21) 2025-09-07T09:15:54.3053134Z #44 0.556 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T09:15:54.3054208Z #44 0.556 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T09:15:54.3055272Z #44 0.556 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T09:15:54.3056434Z #44 0.556 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cublas-cu12* 2025-09-07T09:15:54.3057449Z #44 0.556 DEBUG Searching for a compatible version of nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==9.10.2.21) 2025-09-07T09:15:54.3058783Z #44 0.556 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T09:15:54.3060255Z #44 0.556 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies * 2025-09-07T09:15:54.3061275Z #44 0.556 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T09:15:54.3062034Z #44 0.556 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cublas-cu12* 2025-09-07T09:15:54.3063090Z #44 0.556 DEBUG Searching for a compatible version of nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.4.1, <12.8.4.1+) 2025-09-07T09:15:54.3064631Z #44 0.556 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=12.8.4.1, <12.8.4.1+ 2025-09-07T09:15:54.3065715Z #44 0.556 DEBUG Selecting: nvidia-cublas-cu12==12.8.4.1 [installed] (installed) 2025-09-07T09:15:54.3066475Z #44 0.556 DEBUG Adding transitive dependency for nvidia-cublas-cu12==12.8.4.1: nvidia-cublas-cu12==12.8.4.1 2025-09-07T09:15:54.3067621Z #44 0.556 DEBUG Adding transitive dependency for nvidia-cublas-cu12==12.8.4.1: nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.4.1 2025-09-07T09:15:54.3068643Z #44 0.556 DEBUG Searching for a compatible version of nvidia-cublas-cu12 (==12.8.4.1) 2025-09-07T09:15:54.3069701Z #44 0.556 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.8.4.1 2025-09-07T09:15:54.3070741Z #44 0.556 DEBUG Selecting: nvidia-cublas-cu12==12.8.4.1 [installed] (installed) 2025-09-07T09:15:54.3071806Z #44 0.556 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.8.4.1 2025-09-07T09:15:54.3073372Z #44 0.556 DEBUG Searching for a compatible version of nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.4.1) 2025-09-07T09:15:54.3074680Z #44 0.556 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.8.4.1 2025-09-07T09:15:54.3075707Z #44 0.556 DEBUG Selecting: nvidia-cublas-cu12==12.8.4.1 [installed] (installed) 2025-09-07T09:15:54.3076652Z #44 0.556 DEBUG Searching for a compatible version of nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=11.3.3.83, <11.3.3.83+) 2025-09-07T09:15:54.3078158Z #44 0.556 DEBUG Found installed version of nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=11.3.3.83, <11.3.3.83+ 2025-09-07T09:15:54.3079314Z #44 0.556 DEBUG Selecting: nvidia-cufft-cu12==11.3.3.83 [installed] (installed) 2025-09-07T09:15:54.3080080Z #44 0.556 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.3.3.83: nvidia-cufft-cu12==11.3.3.83 2025-09-07T09:15:54.3081225Z #44 0.556 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.3.3.83: nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==11.3.3.83 2025-09-07T09:15:54.3082236Z #44 0.556 DEBUG Searching for a compatible version of nvidia-cufft-cu12 (==11.3.3.83) 2025-09-07T09:15:54.3083393Z #44 0.556 DEBUG Found installed version of nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.3.3.83 2025-09-07T09:15:54.3084510Z #44 0.556 DEBUG Selecting: nvidia-cufft-cu12==11.3.3.83 [installed] (installed) 2025-09-07T09:15:54.3085645Z #44 0.556 DEBUG Found installed version of nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.3.3.83 2025-09-07T09:15:54.3086880Z #44 0.556 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.3.3.83: nvidia-nvjitlink-cu12* 2025-09-07T09:15:54.3087919Z #44 0.556 DEBUG Searching for a compatible version of nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==11.3.3.83) 2025-09-07T09:15:54.3089317Z #44 0.556 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies * 2025-09-07T09:15:54.3090920Z #44 0.556 DEBUG Found installed version of nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.3.3.83 2025-09-07T09:15:54.3092476Z #44 0.556 DEBUG Selecting: nvidia-cufft-cu12==11.3.3.83 [installed] (installed) 2025-09-07T09:15:54.3093259Z #44 0.556 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.3.3.83: nvidia-nvjitlink-cu12* 2025-09-07T09:15:54.3094335Z #44 0.556 DEBUG Searching for a compatible version of nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=10.3.9.90, <10.3.9.90+) 2025-09-07T09:15:54.3095789Z #44 0.556 DEBUG Found installed version of nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=10.3.9.90, <10.3.9.90+ 2025-09-07T09:15:54.3096928Z #44 0.556 DEBUG Selecting: nvidia-curand-cu12==10.3.9.90 [installed] (installed) 2025-09-07T09:15:54.3097721Z #44 0.556 DEBUG Adding transitive dependency for nvidia-curand-cu12==10.3.9.90: nvidia-curand-cu12==10.3.9.90 2025-09-07T09:15:54.3098908Z #44 0.556 DEBUG Adding transitive dependency for nvidia-curand-cu12==10.3.9.90: nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==10.3.9.90 2025-09-07T09:15:54.3100036Z #44 0.557 DEBUG Searching for a compatible version of nvidia-curand-cu12 (==10.3.9.90) 2025-09-07T09:15:54.3101155Z #44 0.557 DEBUG Found installed version of nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.9.90 2025-09-07T09:15:54.3102252Z #44 0.557 DEBUG Selecting: nvidia-curand-cu12==10.3.9.90 [installed] (installed) 2025-09-07T09:15:54.3103329Z #44 0.557 DEBUG Found installed version of nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.9.90 2025-09-07T09:15:54.3104789Z #44 0.557 DEBUG Searching for a compatible version of nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==10.3.9.90) 2025-09-07T09:15:54.3106157Z #44 0.557 DEBUG Found installed version of nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.9.90 2025-09-07T09:15:54.3107202Z #44 0.557 DEBUG Selecting: nvidia-curand-cu12==10.3.9.90 [installed] (installed) 2025-09-07T09:15:54.3108165Z #44 0.557 DEBUG Searching for a compatible version of nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=11.7.3.90, <11.7.3.90+) 2025-09-07T09:15:54.3109648Z #44 0.557 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=11.7.3.90, <11.7.3.90+ 2025-09-07T09:15:54.3110780Z #44 0.557 DEBUG Selecting: nvidia-cusolver-cu12==11.7.3.90 [installed] (installed) 2025-09-07T09:15:54.3111599Z #44 0.557 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cusolver-cu12==11.7.3.90 2025-09-07T09:15:54.3112784Z #44 0.557 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==11.7.3.90 2025-09-07T09:15:54.3113857Z #44 0.557 DEBUG Searching for a compatible version of nvidia-cusolver-cu12 (==11.7.3.90) 2025-09-07T09:15:54.3114975Z #44 0.557 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.3.90 2025-09-07T09:15:54.3116100Z #44 0.557 DEBUG Selecting: nvidia-cusolver-cu12==11.7.3.90 [installed] (installed) 2025-09-07T09:15:54.3117200Z #44 0.557 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.3.90 2025-09-07T09:15:54.3118370Z #44 0.557 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cublas-cu12* 2025-09-07T09:15:54.3119246Z #44 0.557 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-nvjitlink-cu12* 2025-09-07T09:15:54.3120133Z #44 0.557 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cusparse-cu12* 2025-09-07T09:15:54.3121161Z #44 0.557 DEBUG Searching for a compatible version of nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==11.7.3.90) 2025-09-07T09:15:54.3122525Z #44 0.557 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.3.90 2025-09-07T09:15:54.3124090Z #44 0.557 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies * 2025-09-07T09:15:54.3125226Z #44 0.557 DEBUG Selecting: nvidia-cusolver-cu12==11.7.3.90 [installed] (installed) 2025-09-07T09:15:54.3125998Z #44 0.557 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cublas-cu12* 2025-09-07T09:15:54.3126858Z #44 0.557 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-nvjitlink-cu12* 2025-09-07T09:15:54.3127773Z #44 0.557 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cusparse-cu12* 2025-09-07T09:15:54.3128861Z #44 0.557 DEBUG Searching for a compatible version of nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.5.8.93, <12.5.8.93+) 2025-09-07T09:15:54.3130385Z #44 0.557 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.5.8.93, <12.5.8.93+ 2025-09-07T09:15:54.3131612Z #44 0.557 DEBUG Selecting: nvidia-cusparse-cu12==12.5.8.93 [installed] (installed) 2025-09-07T09:15:54.3132494Z #44 0.557 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.8.93: nvidia-cusparse-cu12==12.5.8.93 2025-09-07T09:15:54.3133946Z #44 0.557 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.8.93: nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.5.8.93 2025-09-07T09:15:54.3135055Z #44 0.557 DEBUG Searching for a compatible version of nvidia-cusparse-cu12 (==12.5.8.93) 2025-09-07T09:15:54.3136293Z #44 0.557 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.8.93 2025-09-07T09:15:54.3137548Z #44 0.557 DEBUG Selecting: nvidia-cusparse-cu12==12.5.8.93 [installed] (installed) 2025-09-07T09:15:54.3138357Z #44 0.557 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.8.93: nvidia-nvjitlink-cu12* 2025-09-07T09:15:54.3139679Z #44 0.557 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.8.93 2025-09-07T09:15:54.3141176Z #44 0.557 DEBUG Searching for a compatible version of nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.5.8.93) 2025-09-07T09:15:54.3142657Z #44 0.557 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.8.93 2025-09-07T09:15:54.3143877Z #44 0.557 DEBUG Selecting: nvidia-cusparse-cu12==12.5.8.93 [installed] (installed) 2025-09-07T09:15:54.3144839Z #44 0.557 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.8.93: nvidia-nvjitlink-cu12* 2025-09-07T09:15:54.3145907Z #44 0.557 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=0.7.1, <0.7.1+) 2025-09-07T09:15:54.3147314Z #44 0.557 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies >=0.7.1, <0.7.1+ 2025-09-07T09:15:54.3148428Z #44 0.557 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T09:15:54.3149219Z #44 0.557 DEBUG Adding transitive dependency for nvidia-cusparselt-cu12==0.7.1: nvidia-cusparselt-cu12==0.7.1 2025-09-07T09:15:54.3150670Z #44 0.557 DEBUG Adding transitive dependency for nvidia-cusparselt-cu12==0.7.1: nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==0.7.1 2025-09-07T09:15:54.3151730Z #44 0.557 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12 (==0.7.1) 2025-09-07T09:15:54.3152828Z #44 0.557 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T09:15:54.3153894Z #44 0.557 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T09:15:54.3154954Z #44 0.557 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T09:15:54.3156331Z #44 0.557 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==0.7.1) 2025-09-07T09:15:54.3157672Z #44 0.557 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T09:15:54.3158732Z #44 0.557 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T09:15:54.3159661Z #44 0.557 DEBUG Searching for a compatible version of nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=2.27.5, <2.27.5+) 2025-09-07T09:15:54.3161057Z #44 0.557 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=2.27.5, <2.27.5+ 2025-09-07T09:15:54.3162209Z #44 0.557 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T09:15:54.3162931Z #44 0.557 DEBUG Adding transitive dependency for nvidia-nccl-cu12==2.27.5: nvidia-nccl-cu12==2.27.5 2025-09-07T09:15:54.3163989Z #44 0.557 DEBUG Adding transitive dependency for nvidia-nccl-cu12==2.27.5: nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==2.27.5 2025-09-07T09:15:54.3164963Z #44 0.557 DEBUG Searching for a compatible version of nvidia-nccl-cu12 (==2.27.5) 2025-09-07T09:15:54.3166090Z #44 0.557 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T09:15:54.3167173Z #44 0.557 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T09:15:54.3168256Z #44 0.557 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T09:15:54.3169587Z #44 0.557 DEBUG Searching for a compatible version of nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==2.27.5) 2025-09-07T09:15:54.3170938Z #44 0.557 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T09:15:54.3172016Z #44 0.557 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T09:15:54.3173231Z #44 0.557 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=3.3.20, <3.3.20+) 2025-09-07T09:15:54.3174748Z #44 0.557 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=3.3.20, <3.3.20+ 2025-09-07T09:15:54.3175957Z #44 0.557 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T09:15:54.3176735Z #44 0.557 DEBUG Adding transitive dependency for nvidia-nvshmem-cu12==3.3.20: nvidia-nvshmem-cu12==3.3.20 2025-09-07T09:15:54.3177907Z #44 0.557 DEBUG Adding transitive dependency for nvidia-nvshmem-cu12==3.3.20: nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==3.3.20 2025-09-07T09:15:54.3178951Z #44 0.557 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12 (==3.3.20) 2025-09-07T09:15:54.3180163Z #44 0.557 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T09:15:54.3181843Z #44 0.557 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T09:15:54.3182994Z #44 0.557 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T09:15:54.3183930Z #44 0.557 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==3.3.20) 2025-09-07T09:15:54.3185480Z #44 0.557 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T09:15:54.3186600Z #44 0.557 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T09:15:54.3187523Z #44 0.557 DEBUG Searching for a compatible version of nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.90, <12.8.90+) 2025-09-07T09:15:54.3188928Z #44 0.557 DEBUG Found installed version of nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.8.90, <12.8.90+ 2025-09-07T09:15:54.3190083Z #44 0.557 DEBUG Selecting: nvidia-nvtx-cu12==12.8.90 [installed] (installed) 2025-09-07T09:15:54.3190875Z #44 0.557 DEBUG Adding transitive dependency for nvidia-nvtx-cu12==12.8.90: nvidia-nvtx-cu12==12.8.90 2025-09-07T09:15:54.3192112Z #44 0.557 DEBUG Adding transitive dependency for nvidia-nvtx-cu12==12.8.90: nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.90 2025-09-07T09:15:54.3193287Z #44 0.557 DEBUG Searching for a compatible version of nvidia-nvtx-cu12 (==12.8.90) 2025-09-07T09:15:54.3194578Z #44 0.557 DEBUG Found installed version of nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:15:54.3196195Z #44 0.557 DEBUG Found installed version of nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:15:54.3197302Z #44 0.557 DEBUG Selecting: nvidia-nvtx-cu12==12.8.90 [installed] (installed) 2025-09-07T09:15:54.3198219Z #44 0.557 DEBUG Searching for a compatible version of nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.90) 2025-09-07T09:15:54.3199625Z #44 0.557 DEBUG Found installed version of nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:15:54.3200738Z #44 0.557 DEBUG Selecting: nvidia-nvtx-cu12==12.8.90 [installed] (installed) 2025-09-07T09:15:54.3201750Z #44 0.557 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.93, <12.8.93+) 2025-09-07T09:15:54.3203296Z #44 0.557 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies >=12.8.93, <12.8.93+ 2025-09-07T09:15:54.3204658Z #44 0.557 DEBUG Selecting: nvidia-nvjitlink-cu12==12.8.93 [installed] (installed) 2025-09-07T09:15:54.3205555Z #44 0.557 DEBUG Adding transitive dependency for nvidia-nvjitlink-cu12==12.8.93: nvidia-nvjitlink-cu12==12.8.93 2025-09-07T09:15:54.3206895Z #44 0.557 DEBUG Adding transitive dependency for nvidia-nvjitlink-cu12==12.8.93: nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.93 2025-09-07T09:15:54.3207961Z #44 0.557 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12 (==12.8.93) 2025-09-07T09:15:54.3209167Z #44 0.557 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T09:15:54.3210841Z #44 0.557 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T09:15:54.3212019Z #44 0.557 DEBUG Selecting: nvidia-nvjitlink-cu12==12.8.93 [installed] (installed) 2025-09-07T09:15:54.3213216Z #44 0.557 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.93) 2025-09-07T09:15:54.3214763Z #44 0.557 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T09:15:54.3215963Z #44 0.557 DEBUG Selecting: nvidia-nvjitlink-cu12==12.8.93 [installed] (installed) 2025-09-07T09:15:54.3216941Z #44 0.557 DEBUG Searching for a compatible version of nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=1.13.1.3, <1.13.1.3+) 2025-09-07T09:15:54.3218468Z #44 0.557 DEBUG Found installed version of nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=1.13.1.3, <1.13.1.3+ 2025-09-07T09:15:54.3219732Z #44 0.557 DEBUG Selecting: nvidia-cufile-cu12==1.13.1.3 [installed] (installed) 2025-09-07T09:15:54.3220519Z #44 0.557 DEBUG Adding transitive dependency for nvidia-cufile-cu12==1.13.1.3: nvidia-cufile-cu12==1.13.1.3 2025-09-07T09:15:54.3221688Z #44 0.557 DEBUG Adding transitive dependency for nvidia-cufile-cu12==1.13.1.3: nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==1.13.1.3 2025-09-07T09:15:54.3222732Z #44 0.557 DEBUG Searching for a compatible version of nvidia-cufile-cu12 (==1.13.1.3) 2025-09-07T09:15:54.3223972Z #44 0.557 DEBUG Found installed version of nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.13.1.3 2025-09-07T09:15:54.3225989Z #44 0.557 DEBUG Found installed version of nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.13.1.3 2025-09-07T09:15:54.3227115Z #44 0.557 DEBUG Selecting: nvidia-cufile-cu12==1.13.1.3 [installed] (installed) 2025-09-07T09:15:54.3228031Z #44 0.558 DEBUG Searching for a compatible version of nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==1.13.1.3) 2025-09-07T09:15:54.3229439Z #44 0.558 DEBUG Found installed version of nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.13.1.3 2025-09-07T09:15:54.3230654Z #44 0.558 DEBUG Selecting: nvidia-cufile-cu12==1.13.1.3 [installed] (installed) 2025-09-07T09:15:54.3231581Z #44 0.558 DEBUG Searching for a compatible version of pytorch-triton{platform_machine == 'x86_64' and sys_platform == 'linux'} (==3.4.0+gitf7888497) 2025-09-07T09:15:54.3233259Z #44 0.558 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T09:15:54.3234556Z #44 0.558 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T09:15:54.3235407Z #44 0.558 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: pytorch-triton==3.4.0+gitf7888497 2025-09-07T09:15:54.3236620Z #44 0.558 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: pytorch-triton{platform_machine == 'x86_64' and sys_platform == 'linux'}==3.4.0+gitf7888497 2025-09-07T09:15:54.3237820Z #44 0.558 DEBUG Searching for a compatible version of pytorch-triton (==3.4.0+gitf7888497) 2025-09-07T09:15:54.3239299Z #44 0.558 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T09:15:54.3240578Z #44 0.558 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T09:15:54.3241876Z #44 0.558 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T09:15:54.3243284Z #44 0.558 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: setuptools>=40.8.0 2025-09-07T09:15:54.3244296Z #44 0.558 DEBUG Searching for a compatible version of pytorch-triton{platform_machine == 'x86_64' and sys_platform == 'linux'} (==3.4.0+gitf7888497) 2025-09-07T09:15:54.3246129Z #44 0.558 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T09:15:54.3247644Z #44 0.558 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies >=40.8.0 2025-09-07T09:15:54.3248568Z #44 0.558 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T09:15:54.3249375Z #44 0.558 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: setuptools>=40.8.0 2025-09-07T09:15:54.3250055Z #44 0.558 DEBUG Searching for a compatible version of numpy (*) 2025-09-07T09:15:54.3250613Z #44 0.558 DEBUG Found installed version of numpy==2.2.6 that satisfies * 2025-09-07T09:15:54.3251145Z #44 0.558 DEBUG Selecting: numpy==2.2.6 [installed] (installed) 2025-09-07T09:15:54.3251675Z #44 0.558 DEBUG Searching for a compatible version of filelock (*) 2025-09-07T09:15:54.3252745Z #44 0.558 DEBUG Found installed version of filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) that satisfies * 2025-09-07T09:15:54.3253550Z #44 0.558 DEBUG Selecting: filelock==3.19.1 [installed] (installed) 2025-09-07T09:15:54.3254169Z #44 0.558 DEBUG Searching for a compatible version of typing-extensions (>=4.10.0) 2025-09-07T09:15:54.3255171Z #44 0.558 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.10.0 2025-09-07T09:15:54.3256161Z #44 0.558 DEBUG Selecting: typing-extensions==4.14.1 [installed] (installed) 2025-09-07T09:15:54.3256909Z #44 0.558 DEBUG Searching for a compatible version of setuptools{python_full_version >= '3.12'} (*) 2025-09-07T09:15:54.3257885Z #44 0.558 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies * 2025-09-07T09:15:54.3258741Z #44 0.558 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T09:15:54.3259415Z #44 0.558 DEBUG Adding transitive dependency for setuptools==78.1.0: setuptools==78.1.0 2025-09-07T09:15:54.3260291Z #44 0.558 DEBUG Adding transitive dependency for setuptools==78.1.0: setuptools{python_full_version >= '3.12'}==78.1.0 2025-09-07T09:15:54.3261088Z #44 0.558 DEBUG Searching for a compatible version of setuptools (==78.1.0) 2025-09-07T09:15:54.3262002Z #44 0.558 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T09:15:54.3262883Z #44 0.558 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T09:15:54.3263751Z #44 0.558 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T09:15:54.3264886Z #44 0.558 DEBUG Searching for a compatible version of setuptools{python_full_version >= '3.12'} (==78.1.0) 2025-09-07T09:15:54.3265999Z #44 0.558 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T09:15:54.3266811Z #44 0.558 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T09:15:54.3267522Z #44 0.558 DEBUG Searching for a compatible version of sympy (>=1.13.3) 2025-09-07T09:15:54.3268307Z #44 0.558 DEBUG Found installed version of sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) that satisfies >=1.13.3 2025-09-07T09:15:54.3269080Z #44 0.558 DEBUG Selecting: sympy==1.14.0 [installed] (installed) 2025-09-07T09:15:54.3269660Z #44 0.558 DEBUG Adding transitive dependency for sympy==1.14.0: mpmath>=1.1.0, <1.4 2025-09-07T09:15:54.3270420Z #44 0.558 DEBUG Searching for a compatible version of networkx (>=2.5.1) 2025-09-07T09:15:54.3271218Z #44 0.558 DEBUG Found installed version of networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) that satisfies >=2.5.1 2025-09-07T09:15:54.3271973Z #44 0.558 DEBUG Selecting: networkx==3.5 [installed] (installed) 2025-09-07T09:15:54.3272661Z #44 0.558 DEBUG Searching for a compatible version of jinja2 (*) 2025-09-07T09:15:54.3273402Z #44 0.558 DEBUG Found installed version of jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) that satisfies * 2025-09-07T09:15:54.3274145Z #44 0.558 DEBUG Selecting: jinja2==3.1.6 [installed] (installed) 2025-09-07T09:15:54.3274723Z #44 0.558 DEBUG Adding transitive dependency for jinja2==3.1.6: markupsafe>=2.0 2025-09-07T09:15:54.3275355Z #44 0.558 DEBUG Searching for a compatible version of fsspec (>=0.8.5) 2025-09-07T09:15:54.3276190Z #44 0.558 DEBUG Found installed version of fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) that satisfies >=0.8.5 2025-09-07T09:15:54.3276995Z #44 0.558 DEBUG Selecting: fsspec==2025.7.0 [installed] (installed) 2025-09-07T09:15:54.3277576Z #44 0.558 DEBUG Found fresh response for: https://pypi.org/simple/mpmath/ 2025-09-07T09:15:54.3278412Z #44 0.558 DEBUG Found installed version of mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) that satisfies >=1.1.0, <1.4 2025-09-07T09:15:54.3279386Z #44 0.558 DEBUG Searching for a compatible version of mpmath (>=1.1.0, <1.4) 2025-09-07T09:15:54.3280208Z #44 0.558 DEBUG Found installed version of mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) that satisfies >=1.1.0, <1.4 2025-09-07T09:15:54.3280966Z #44 0.558 DEBUG Selecting: mpmath==1.3.0 [installed] (installed) 2025-09-07T09:15:54.3281535Z #44 0.559 DEBUG Found fresh response for: https://pypi.org/simple/markupsafe/ 2025-09-07T09:15:54.3282744Z #44 0.559 DEBUG Found installed version of markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) that satisfies >=2.0 2025-09-07T09:15:54.3283774Z #44 0.559 DEBUG Searching for a compatible version of markupsafe (>=2.0) 2025-09-07T09:15:54.3284798Z #44 0.559 DEBUG Found installed version of markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) that satisfies >=2.0 2025-09-07T09:15:54.3285818Z #44 0.559 DEBUG Selecting: markupsafe==3.0.2 [installed] (installed) 2025-09-07T09:15:54.3288395Z #44 0.559 DEBUG Tried 28 versions: filelock 1, fsspec 1, jinja2 1, markupsafe 1, mpmath 1, networkx 1, numpy 1, nvidia-cublas-cu12 1, nvidia-cuda-cupti-cu12 1, nvidia-cuda-nvrtc-cu12 1, nvidia-cuda-runtime-cu12 1, nvidia-cudnn-cu12 1, nvidia-cufft-cu12 1, nvidia-cufile-cu12 1, nvidia-curand-cu12 1, nvidia-cusolver-cu12 1, nvidia-cusparse-cu12 1, nvidia-cusparselt-cu12 1, nvidia-nccl-cu12 1, nvidia-nvjitlink-cu12 1, nvidia-nvshmem-cu12 1, nvidia-nvtx-cu12 1, pytorch-triton 1, setuptools 1, sympy 1, torch 1, typing-extensions 1, xformers 1 2025-09-07T09:15:54.3291000Z #44 0.559 DEBUG marker environment resolution took 0.009s 2025-09-07T09:15:54.3291425Z #44 0.560 Resolved 28 packages in 12ms 2025-09-07T09:15:54.3292756Z #44 0.560 DEBUG Requirement already installed: nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:54.3294264Z #44 0.560 DEBUG Requirement already installed: nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) 2025-09-07T09:15:54.3295639Z #44 0.560 DEBUG Requirement already installed: nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T09:15:54.3296959Z #44 0.560 DEBUG Identified uncached distribution: xformers @ file:///wheels/xformers/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl 2025-09-07T09:15:54.3298371Z #44 0.560 DEBUG Requirement already installed: nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:54.3299931Z #44 0.560 DEBUG Requirement already installed: nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:54.3301385Z #44 0.560 DEBUG Requirement already installed: nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T09:15:54.3302563Z #44 0.560 DEBUG Requirement already installed: sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) 2025-09-07T09:15:54.3303812Z #44 0.560 DEBUG Requirement already installed: nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:54.3305423Z #44 0.560 DEBUG Requirement already installed: nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T09:15:54.3306845Z #44 0.560 DEBUG Requirement already installed: nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T09:15:54.3308350Z #44 0.560 DEBUG Requirement already installed: nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T09:15:54.3309574Z #44 0.560 DEBUG Requirement already installed: jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) 2025-09-07T09:15:54.3310859Z #44 0.560 DEBUG Requirement already installed: pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:15:54.3312378Z #44 0.560 DEBUG Requirement already installed: nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:54.3313576Z #44 0.560 DEBUG Requirement already installed: setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) 2025-09-07T09:15:54.3314750Z #44 0.560 DEBUG Requirement already installed: markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:15:54.3315921Z #44 0.560 DEBUG Requirement already installed: filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) 2025-09-07T09:15:54.3316739Z #44 0.560 DEBUG Requirement already installed: numpy==2.2.6 2025-09-07T09:15:54.3317732Z #44 0.560 DEBUG Requirement already installed: nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:54.3319358Z #44 0.560 DEBUG Requirement already installed: torch==2.9.0.dev20250906+cu128 (from file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T09:15:54.3320774Z #44 0.560 DEBUG Requirement already installed: nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:54.3322036Z #44 0.560 DEBUG Requirement already installed: networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) 2025-09-07T09:15:54.3323034Z #44 0.560 DEBUG Requirement already installed: typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) 2025-09-07T09:15:54.3324219Z #44 0.560 DEBUG Requirement already installed: fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) 2025-09-07T09:15:54.3325396Z #44 0.560 DEBUG Requirement already installed: nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:15:54.3326547Z #44 0.560 DEBUG Requirement already installed: mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) 2025-09-07T09:15:54.3327744Z #44 0.560 DEBUG Requirement already installed: nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T09:15:54.3328766Z #44 0.560 DEBUG Unnecessary package: pyyaml==6.0.2 2025-09-07T09:15:54.3329239Z #44 0.560 DEBUG Unnecessary package: aiohappyeyeballs==2.6.1 2025-09-07T09:15:54.3329699Z #44 0.560 DEBUG Unnecessary package: aiohttp==3.12.15 2025-09-07T09:15:54.3330144Z #44 0.560 DEBUG Unnecessary package: aiosignal==1.4.0 2025-09-07T09:15:54.3330694Z #44 0.560 DEBUG Unnecessary package: annotated-types==0.7.0 2025-09-07T09:15:54.3331308Z #44 0.560 DEBUG Unnecessary package: anyio==4.10.0 2025-09-07T09:15:54.3331710Z #44 0.560 DEBUG Unnecessary package: astor==0.8.1 2025-09-07T09:15:54.3332195Z #44 0.560 DEBUG Unnecessary package: attrs==25.3.0 2025-09-07T09:15:54.3332819Z #44 0.560 DEBUG Unnecessary package: blake3==1.0.5 2025-09-07T09:15:54.3333248Z #44 0.560 DEBUG Unnecessary package: build==1.3.0 2025-09-07T09:15:54.3333699Z #44 0.560 DEBUG Unnecessary package: cachetools==6.2.0 2025-09-07T09:15:54.3334136Z #44 0.560 DEBUG Unnecessary package: cbor2==5.7.0 2025-09-07T09:15:54.3334583Z #44 0.560 DEBUG Unnecessary package: certifi==2025.8.3 2025-09-07T09:15:54.3335010Z #44 0.560 DEBUG Unnecessary package: cffi==1.17.1 2025-09-07T09:15:54.3335491Z #44 0.560 DEBUG Unnecessary package: charset-normalizer==3.4.3 2025-09-07T09:15:54.3336005Z #44 0.560 DEBUG Unnecessary package: click==8.2.1 2025-09-07T09:15:54.3336458Z #44 0.560 DEBUG Unnecessary package: cloudpickle==3.1.1 2025-09-07T09:15:54.3336966Z #44 0.560 DEBUG Unnecessary package: compressed-tensors==0.11.0 2025-09-07T09:15:54.3337469Z #44 0.560 DEBUG Unnecessary package: cupy-cuda12x==13.6.0 2025-09-07T09:15:54.3337931Z #44 0.560 DEBUG Unnecessary package: depyf==0.19.0 2025-09-07T09:15:54.3338354Z #44 0.560 DEBUG Unnecessary package: dill==0.4.0 2025-09-07T09:15:54.3338791Z #44 0.560 DEBUG Unnecessary package: diskcache==5.6.3 2025-09-07T09:15:54.3339220Z #44 0.560 DEBUG Unnecessary package: distro==1.9.0 2025-09-07T09:15:54.3339668Z #44 0.560 DEBUG Unnecessary package: dnspython==2.7.0 2025-09-07T09:15:54.3340092Z #44 0.560 DEBUG Unnecessary package: einops==0.8.1 2025-09-07T09:15:54.3340559Z #44 0.560 DEBUG Unnecessary package: email-validator==2.3.0 2025-09-07T09:15:54.3341038Z #44 0.560 DEBUG Unnecessary package: fastapi==0.116.1 2025-09-07T09:15:54.3341623Z #44 0.560 DEBUG Unnecessary package: fastapi-cli==0.0.10 2025-09-07T09:15:54.3352535Z #44 0.560 DEBUG Unnecessary package: fastapi-cloud-cli==0.1.5 2025-09-07T09:15:54.3353038Z #44 0.560 DEBUG Unnecessary package: fastrlock==0.8.3 2025-09-07T09:15:54.3353503Z #44 0.560 DEBUG Unnecessary package: frozendict==2.4.6 2025-09-07T09:15:54.3353946Z #44 0.560 DEBUG Unnecessary package: frozenlist==1.7.0 2025-09-07T09:15:54.3354388Z #44 0.560 DEBUG Unnecessary package: gguf==0.17.1 2025-09-07T09:15:54.3354801Z #44 0.560 DEBUG Unnecessary package: h11==0.16.0 2025-09-07T09:15:54.3355199Z #44 0.560 DEBUG Unnecessary package: hf-xet==1.1.9 2025-09-07T09:15:54.3355645Z #44 0.560 DEBUG Unnecessary package: httpcore==1.0.9 2025-09-07T09:15:54.3356053Z #44 0.560 DEBUG Unnecessary package: httptools==0.6.4 2025-09-07T09:15:54.3356462Z #44 0.560 DEBUG Unnecessary package: httpx==0.28.1 2025-09-07T09:15:54.3356906Z #44 0.560 DEBUG Unnecessary package: huggingface-hub==0.34.4 2025-09-07T09:15:54.3357357Z #44 0.560 DEBUG Unnecessary package: idna==3.10 2025-09-07T09:15:54.3357787Z #44 0.560 DEBUG Unnecessary package: interegular==0.3.3 2025-09-07T09:15:54.3358211Z #44 0.560 DEBUG Unnecessary package: jiter==0.10.0 2025-09-07T09:15:54.3358647Z #44 0.560 DEBUG Unnecessary package: jsonschema==4.25.1 2025-09-07T09:15:54.3359166Z #44 0.560 DEBUG Unnecessary package: jsonschema-specifications==2025.4.1 2025-09-07T09:15:54.3359678Z #44 0.560 DEBUG Unnecessary package: lark==1.2.2 2025-09-07T09:15:54.3360097Z #44 0.560 DEBUG Unnecessary package: llguidance==0.7.30 2025-09-07T09:15:54.3360546Z #44 0.560 DEBUG Unnecessary package: llvmlite==0.44.0 2025-09-07T09:15:54.3361104Z #44 0.560 DEBUG Unnecessary package: lm-format-enforcer==0.11.3 2025-09-07T09:15:54.3361613Z #44 0.560 DEBUG Unnecessary package: markdown-it-py==4.0.0 2025-09-07T09:15:54.3362063Z #44 0.560 DEBUG Unnecessary package: mdurl==0.1.2 2025-09-07T09:15:54.3362498Z #44 0.560 DEBUG Unnecessary package: mistral-common==1.8.4 2025-09-07T09:15:54.3362952Z #44 0.560 DEBUG Unnecessary package: msgpack==1.1.1 2025-09-07T09:15:54.3363378Z #44 0.560 DEBUG Unnecessary package: msgspec==0.19.0 2025-09-07T09:15:54.3363827Z #44 0.560 DEBUG Unnecessary package: multidict==6.6.4 2025-09-07T09:15:54.3364345Z #44 0.560 DEBUG Unnecessary package: ninja==1.13.0 2025-09-07T09:15:54.3364751Z #44 0.560 DEBUG Unnecessary package: numba==0.61.2 2025-09-07T09:15:54.3365167Z #44 0.560 DEBUG Unnecessary package: openai==1.106.1 2025-09-07T09:15:54.3365635Z #44 0.560 DEBUG Unnecessary package: openai-harmony==0.0.4 2025-09-07T09:15:54.3366156Z #44 0.560 DEBUG Unnecessary package: opencv-python-headless==4.12.0.88 2025-09-07T09:15:54.3366652Z #44 0.560 DEBUG Unnecessary package: opt-einsum==3.4.0 2025-09-07T09:15:54.3367096Z #44 0.560 DEBUG Unnecessary package: outlines-core==0.2.10 2025-09-07T09:15:54.3367527Z #44 0.560 DEBUG Unnecessary package: packaging==25.0 2025-09-07T09:15:54.3368025Z #44 0.560 DEBUG Unnecessary package: partial-json-parser==0.2.1.1.post6 2025-09-07T09:15:54.3368931Z #44 0.560 DEBUG Unnecessary package: pillow==11.3.0 (from file:///dist/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:15:54.3369702Z #44 0.560 DEBUG Preserving seed package: pip==25.2 2025-09-07T09:15:54.3370164Z #44 0.560 DEBUG Unnecessary package: prometheus-client==0.22.1 2025-09-07T09:15:54.3370729Z #44 0.560 DEBUG Unnecessary package: prometheus-fastapi-instrumentator==7.1.0 2025-09-07T09:15:54.3371282Z #44 0.560 DEBUG Unnecessary package: propcache==0.3.2 2025-09-07T09:15:54.3371700Z #44 0.560 DEBUG Unnecessary package: protobuf==6.32.0 2025-09-07T09:15:54.3372211Z #44 0.560 DEBUG Unnecessary package: psutil==7.0.0 2025-09-07T09:15:54.3372834Z #44 0.560 DEBUG Unnecessary package: py-cpuinfo==9.0.0 2025-09-07T09:15:54.3373280Z #44 0.560 DEBUG Unnecessary package: pybase64==1.4.2 2025-09-07T09:15:54.3373740Z #44 0.560 DEBUG Unnecessary package: pycountry==24.6.1 2025-09-07T09:15:54.3374184Z #44 0.560 DEBUG Unnecessary package: pycparser==2.22 2025-09-07T09:15:54.3374638Z #44 0.560 DEBUG Unnecessary package: pydantic==2.11.7 2025-09-07T09:15:54.3375143Z #44 0.560 DEBUG Unnecessary package: pydantic-core==2.33.2 2025-09-07T09:15:54.3375673Z #44 0.560 DEBUG Unnecessary package: pydantic-extra-types==2.10.5 2025-09-07T09:15:54.3376171Z #44 0.560 DEBUG Unnecessary package: pygments==2.19.2 2025-09-07T09:15:54.3376648Z #44 0.560 DEBUG Unnecessary package: pyproject-hooks==1.2.0 2025-09-07T09:15:54.3377153Z #44 0.560 DEBUG Unnecessary package: python-dotenv==1.1.1 2025-09-07T09:15:54.3377649Z #44 0.560 DEBUG Unnecessary package: python-json-logger==3.3.0 2025-09-07T09:15:54.3378172Z #44 0.560 DEBUG Unnecessary package: python-multipart==0.0.20 2025-09-07T09:15:54.3378638Z #44 0.560 DEBUG Unnecessary package: pyzmq==27.0.2 2025-09-07T09:15:54.3379063Z #44 0.560 DEBUG Unnecessary package: ray==2.49.1 2025-09-07T09:15:54.3379494Z #44 0.560 DEBUG Unnecessary package: referencing==0.36.2 2025-09-07T09:15:54.3379954Z #44 0.560 DEBUG Unnecessary package: regex==2025.9.1 2025-09-07T09:15:54.3380398Z #44 0.560 DEBUG Unnecessary package: requests==2.32.5 2025-09-07T09:15:54.3380824Z #44 0.560 DEBUG Unnecessary package: rich==14.1.0 2025-09-07T09:15:54.3381276Z #44 0.560 DEBUG Unnecessary package: rich-toolkit==0.15.1 2025-09-07T09:15:54.3381719Z #44 0.560 DEBUG Unnecessary package: rignore==0.6.4 2025-09-07T09:15:54.3382165Z #44 0.560 DEBUG Unnecessary package: rpds-py==0.27.1 2025-09-07T09:15:54.3382611Z #44 0.560 DEBUG Unnecessary package: safetensors==0.6.2 2025-09-07T09:15:54.3383067Z #44 0.560 DEBUG Unnecessary package: scipy==1.16.1 2025-09-07T09:15:54.3383513Z #44 0.560 DEBUG Unnecessary package: sentencepiece==0.2.1 2025-09-07T09:15:54.3384023Z #44 0.560 DEBUG Unnecessary package: sentry-sdk==2.37.0 2025-09-07T09:15:54.3384599Z #44 0.560 DEBUG Unnecessary package: setproctitle==1.3.7 2025-09-07T09:15:54.3385147Z #44 0.560 DEBUG Unnecessary package: shellingham==1.5.4 2025-09-07T09:15:54.3385564Z #44 0.560 DEBUG Unnecessary package: six==1.17.0 2025-09-07T09:15:54.3385954Z #44 0.560 DEBUG Unnecessary package: sniffio==1.3.1 2025-09-07T09:15:54.3386384Z #44 0.560 DEBUG Unnecessary package: soundfile==0.13.1 2025-09-07T09:15:54.3386802Z #44 0.560 DEBUG Unnecessary package: soxr==0.5.0.post1 2025-09-07T09:15:54.3387233Z #44 0.560 DEBUG Unnecessary package: starlette==0.47.3 2025-09-07T09:15:54.3387659Z #44 0.560 DEBUG Unnecessary package: tiktoken==0.11.0 2025-09-07T09:15:54.3388076Z #44 0.560 DEBUG Unnecessary package: tokenizers==0.22.0 2025-09-07T09:15:54.3389020Z #44 0.560 DEBUG Unnecessary package: torchaudio==2.8.0.dev20250906+cu128 (from file:///dist/torchaudio-2.8.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T09:15:54.3390389Z #44 0.560 DEBUG Unnecessary package: torchvision==0.24.0.dev20250906+cu128 (from file:///dist/torchvision-0.24.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T09:15:54.3391287Z #44 0.560 DEBUG Unnecessary package: tqdm==4.67.1 2025-09-07T09:15:54.3391715Z #44 0.560 DEBUG Unnecessary package: transformers==4.56.1 2025-09-07T09:15:54.3392487Z #44 0.560 DEBUG Unnecessary package: triton==3.4.0 2025-09-07T09:15:54.3392999Z #44 0.560 DEBUG Unnecessary package: typer==0.17.4 2025-09-07T09:15:54.3393465Z #44 0.560 DEBUG Unnecessary package: typing-inspection==0.4.1 2025-09-07T09:15:54.3393945Z #44 0.560 DEBUG Unnecessary package: urllib3==2.5.0 2025-09-07T09:15:54.3394366Z #44 0.560 DEBUG Preserving seed package: uv==0.8.4 2025-09-07T09:15:54.3394812Z #44 0.560 DEBUG Unnecessary package: uvicorn==0.35.0 2025-09-07T09:15:54.3395238Z #44 0.560 DEBUG Unnecessary package: uvloop==0.21.0 2025-09-07T09:15:54.3396221Z #44 0.560 DEBUG Unnecessary package: vllm==0.10.2rc2.dev125+g4172235ab.d20250907 (from file:///wheels/vllm/vllm-0.10.2rc2.dev125+g4172235ab.d20250907-cp38-abi3-linux_x86_64.whl) 2025-09-07T09:15:54.3397224Z #44 0.560 DEBUG Unnecessary package: watchfiles==1.1.0 2025-09-07T09:15:54.3397674Z #44 0.560 DEBUG Unnecessary package: websockets==15.0.1 2025-09-07T09:15:54.3398120Z #44 0.560 DEBUG Unnecessary package: wheel==0.45.1 2025-09-07T09:15:54.3398600Z #44 0.560 DEBUG Unnecessary package: xgrammar==0.1.23 2025-09-07T09:15:54.3399037Z #44 0.560 DEBUG Unnecessary package: yarl==1.20.1 2025-09-07T09:15:56.3055124Z #44 2.723 Prepared 1 package in 2.16s 2025-09-07T09:15:56.7527889Z #44 3.170 Installed 1 package in 446ms 2025-09-07T09:15:56.7528777Z #44 3.170 + xformers==0.0.33+5d4b92a5.d20250907 (from file:///wheels/xformers/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl) 2025-09-07T09:15:56.9032378Z #44 3.171 DEBUG Released lock at `/tmp/uv-281d6a3886c08524.lock` 2025-09-07T09:16:12.3056263Z #44 DONE 18.7s 2025-09-07T09:16:12.4589552Z 2025-09-07T09:16:12.4590263Z #45 [vllm-base 13/18] RUN pip install build==1.3.0 2025-09-07T09:16:13.1421669Z #45 0.834 Requirement already satisfied: build==1.3.0 in /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages (1.3.0) 2025-09-07T09:16:13.2940784Z #45 0.835 Requirement already satisfied: packaging>=19.1 in /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages (from build==1.3.0) (25.0) 2025-09-07T09:16:13.2942115Z #45 0.836 Requirement already satisfied: pyproject_hooks in /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages (from build==1.3.0) (1.2.0) 2025-09-07T09:16:13.4163888Z #45 1.108 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-09-07T09:16:13.5336943Z #45 DONE 1.2s 2025-09-07T09:16:13.6854609Z 2025-09-07T09:16:13.6855270Z #46 [vllm-base 14/18] RUN pip freeze | grep -E 'setuptools|packaging|build' 2025-09-07T09:16:15.0698997Z #46 1.535 build==1.3.0 2025-09-07T09:16:15.0699368Z #46 1.535 packaging==25.0 2025-09-07T09:16:15.0699804Z #46 1.535 setuptools @ file:///dist/setuptools-78.1.0-py3-none-any.whl 2025-09-07T09:16:15.2369159Z #46 DONE 1.5s 2025-09-07T09:16:15.2369379Z 2025-09-07T09:16:15.2371227Z #47 [vllm-base 15/18] RUN --mount=type=cache,target=/root/.cache/uv git clone --depth 1 --recursive --shallow-submodules --branch v0.2.14.post1 https://github.com/flashinfer-ai/flashinfer.git flashinfer && echo "Building FlashInfer with AOT for arches: 8.0;8.9;9.0;10.0;12.0" && cd flashinfer && python3 -m flashinfer.aot && python3 -m build --no-isolation --wheel --outdir ../wheels/flashinfer && cd .. && rm -rf flashinfer 2025-09-07T09:16:15.9039248Z #47 0.817 Cloning into 'flashinfer'... 2025-09-07T09:16:16.3863021Z #47 1.300 Note: switching to '038032209794e4ef4608324723efc979a06d5239'. 2025-09-07T09:16:16.3863550Z #47 1.300 2025-09-07T09:16:16.3863959Z #47 1.300 You are in 'detached HEAD' state. You can look around, make experimental 2025-09-07T09:16:16.3864870Z #47 1.300 changes and commit them, and you can discard any commits you make in this 2025-09-07T09:16:16.3865526Z #47 1.300 state without impacting any branches by switching back to a branch. 2025-09-07T09:16:16.3866161Z #47 1.300 2025-09-07T09:16:16.3866557Z #47 1.300 If you want to create a new branch to retain commits you create, you may 2025-09-07T09:16:16.3867154Z #47 1.300 do so (now or later) by using -c with the switch command. Example: 2025-09-07T09:16:16.3867597Z #47 1.300 2025-09-07T09:16:16.3867865Z #47 1.300 git switch -c 2025-09-07T09:16:16.3868198Z #47 1.300 2025-09-07T09:16:16.3868455Z #47 1.300 Or undo this operation with: 2025-09-07T09:16:16.3868763Z #47 1.300 2025-09-07T09:16:16.3868997Z #47 1.300 git switch - 2025-09-07T09:16:16.3869257Z #47 1.300 2025-09-07T09:16:16.3869670Z #47 1.300 Turn off this advice by setting config variable advice.detachedHead to false 2025-09-07T09:16:16.3870154Z #47 1.300 2025-09-07T09:16:16.4992390Z #47 1.412 Submodule '3rdparty/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path '3rdparty/cutlass' 2025-09-07T09:16:16.6542330Z #47 1.412 Submodule '3rdparty/spdlog' (https://github.com/gabime/spdlog.git) registered for path '3rdparty/spdlog' 2025-09-07T09:16:16.6543174Z #47 1.417 Cloning into '/workspace/flashinfer/3rdparty/cutlass'... 2025-09-07T09:16:19.0104692Z #47 3.924 Cloning into '/workspace/flashinfer/3rdparty/spdlog'... 2025-09-07T09:16:19.9161226Z #47 4.829 From https://github.com/NVIDIA/cutlass 2025-09-07T09:16:19.9161866Z #47 4.829 * branch e51efbfe18fe4f4cbb66ab814c55bf4aa0185491 -> FETCH_HEAD 2025-09-07T09:16:20.6619384Z #47 5.575 Submodule path '3rdparty/cutlass': checked out 'e51efbfe18fe4f4cbb66ab814c55bf4aa0185491' 2025-09-07T09:16:21.0557861Z #47 5.969 From https://github.com/gabime/spdlog 2025-09-07T09:16:21.0558470Z #47 5.969 * branch c3aed4b68373955e1cc94307683d44dca1515d2b -> FETCH_HEAD 2025-09-07T09:16:21.2363072Z #47 5.995 Submodule path '3rdparty/spdlog': checked out 'c3aed4b68373955e1cc94307683d44dca1515d2b' 2025-09-07T09:16:21.2363834Z #47 5.999 Building FlashInfer with AOT for arches: 8.0;8.9;9.0;10.0;12.0 2025-09-07T09:16:25.1852982Z #47 10.10 W0907 09:16:25.183000 183 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/torch/utils/cpp_extension.py:117] No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' 2025-09-07T09:16:25.9414431Z #47 10.85 AOT build summary: 2025-09-07T09:16:25.9414859Z #47 10.85 out_dir: /workspace/flashinfer/aot-ops 2025-09-07T09:16:25.9415330Z #47 10.85 build_dir: /workspace/flashinfer/build/aot 2025-09-07T09:16:25.9415758Z #47 10.85 fa2_head_dim: [(64, 64), (128, 128)] 2025-09-07T09:16:25.9416149Z #47 10.85 fa3_head_dim: [(192, 128), (128, 128)] 2025-09-07T09:16:25.9416574Z #47 10.85 f16_dtype: [torch.float16, torch.bfloat16] 2025-09-07T09:16:25.9417156Z #47 10.85 f8_dtype: [torch.float8_e4m3fn] 2025-09-07T09:16:25.9417538Z #47 10.85 use_sliding_window: [False] 2025-09-07T09:16:25.9417910Z #47 10.85 use_logits_soft_cap: [False] 2025-09-07T09:16:25.9418289Z #47 10.85 TORCH_CUDA_ARCH_LIST: 8.0;8.9;9.0;10.0;12.0 2025-09-07T09:16:25.9418672Z #47 10.85 has_sm90: True 2025-09-07T09:16:25.9418960Z #47 10.85 has_sm100: True 2025-09-07T09:16:25.9419274Z #47 10.85 add_comm: False 2025-09-07T09:16:25.9419564Z #47 10.85 add_gemma: False 2025-09-07T09:16:25.9419872Z #47 10.85 add_oai_oss: True 2025-09-07T09:16:25.9420164Z #47 10.85 add_moe: False 2025-09-07T09:16:25.9420455Z #47 10.85 add_act: False 2025-09-07T09:16:25.9420732Z #47 10.85 add_misc: True 2025-09-07T09:16:25.9421119Z #47 10.85 Generating JIT specs... 2025-09-07T09:16:25.9421445Z #47 10.85 Total ops: 60 2025-09-07T09:16:26.0949058Z #47 10.86 ninja: Entering directory `/workspace/flashinfer/build/aot/cached_ops' 2025-09-07T09:17:22.3657101Z #47 67.28 [1/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o 2025-09-07T09:17:24.3363084Z #47 69.25 [2/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o 2025-09-07T09:17:35.2647527Z #47 80.18 [3/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T09:17:37.3937261Z #47 82.31 [4/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o 2025-09-07T09:17:37.7747111Z #47 82.69 [5/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T09:17:37.9569696Z #47 82.71 [6/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o 2025-09-07T09:17:38.0241501Z #47 82.94 [7/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o 2025-09-07T09:17:39.6563276Z #47 84.57 [8/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T09:17:40.2283336Z #47 85.14 [9/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o 2025-09-07T09:17:40.4258479Z #47 85.34 [10/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o 2025-09-07T09:17:40.8322185Z #47 85.74 [11/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o 2025-09-07T09:17:43.8237193Z #47 88.74 [12/412] c++ -MMD -MF logging/logging.o.d -DTORCH_EXTENSION_NAME=logging -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -I/workspace/flashinfer/3rdparty/spdlog/include -I/workspace/flashinfer/include -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include -fPIC -O3 -std=c++17 -Wno-switch-bool -c /workspace/flashinfer/csrc/logging.cc -o logging/logging.o 2025-09-07T09:17:44.9216311Z #47 89.83 [13/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T09:17:45.0750579Z #47 89.99 [14/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o 2025-09-07T09:17:46.7444004Z #47 91.66 [15/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T09:17:47.5742653Z #47 92.49 [16/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T09:17:49.0042151Z #47 93.92 [17/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o 2025-09-07T09:17:51.5427524Z #47 96.45 [18/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T09:17:52.8849683Z #47 97.80 [19/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T09:17:53.7552267Z #47 98.67 [20/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T09:17:54.0756159Z #47 98.99 [21/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o 2025-09-07T09:17:54.5712192Z #47 99.48 [22/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o 2025-09-07T09:17:54.7274586Z #47 99.49 [23/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o 2025-09-07T09:17:54.9147143Z #47 99.83 [24/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T09:17:55.5353679Z #47 100.4 [25/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o 2025-09-07T09:17:56.2738597Z #47 101.2 [26/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T09:17:58.6042154Z #47 103.5 [27/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T09:17:59.1859704Z #47 104.1 [28/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T09:17:59.4410677Z #47 104.4 [29/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o 2025-09-07T09:18:01.2675439Z #47 106.2 [30/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o 2025-09-07T09:18:01.9582739Z #47 106.9 [31/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o 2025-09-07T09:18:06.9661409Z #47 111.9 [32/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o 2025-09-07T09:18:07.6855036Z #47 112.6 [33/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o 2025-09-07T09:18:07.7856285Z #47 112.7 [34/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o 2025-09-07T09:18:08.7643420Z #47 113.7 [35/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o 2025-09-07T09:18:09.4752153Z #47 114.4 [36/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o 2025-09-07T09:18:09.8076802Z #47 114.7 [37/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o 2025-09-07T09:18:11.5249460Z #47 116.4 [38/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o 2025-09-07T09:18:11.8157917Z #47 116.7 [39/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T09:18:12.1349938Z #47 117.0 [40/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T09:18:14.1054534Z #47 119.0 [41/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o 2025-09-07T09:18:15.1097043Z #47 120.0 [42/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o 2025-09-07T09:18:15.5512944Z #47 120.5 [43/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o 2025-09-07T09:18:16.7138691Z #47 121.6 [44/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o 2025-09-07T09:18:17.4846367Z #47 122.4 [45/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o 2025-09-07T09:18:17.7138064Z #47 122.6 [46/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o 2025-09-07T09:18:19.2977010Z #47 124.2 [47/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o 2025-09-07T09:18:19.8839643Z #47 124.8 [48/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T09:18:20.3851552Z #47 125.3 [49/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o 2025-09-07T09:18:22.6744687Z #47 127.6 [50/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T09:18:25.8717193Z #47 130.8 [51/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T09:18:29.9740738Z #47 134.9 [52/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o 2025-09-07T09:18:35.7243420Z #47 140.6 [53/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T09:18:36.5638562Z #47 141.5 [54/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T09:18:37.6943208Z #47 142.6 [55/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T09:18:46.6467529Z #47 151.6 [56/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T09:18:49.5436163Z #47 154.5 [57/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T09:18:55.5739106Z #47 160.5 [58/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T09:19:04.2589972Z #47 169.2 [59/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o 2025-09-07T09:19:11.7947396Z #47 176.7 [60/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o 2025-09-07T09:19:12.4251334Z #47 177.3 [61/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o 2025-09-07T09:19:12.5935642Z #47 177.5 [62/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o 2025-09-07T09:19:12.9360985Z #47 177.8 [63/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o 2025-09-07T09:19:17.5249620Z #47 182.4 [64/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T09:19:17.7170030Z #47 182.6 [65/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o 2025-09-07T09:19:18.7235285Z #47 183.6 [66/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o 2025-09-07T09:19:22.8199720Z #47 187.7 [67/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T09:19:22.9933814Z #47 187.8 [68/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T09:19:23.4254231Z #47 188.3 [69/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o 2025-09-07T09:19:23.5559324Z #47 188.5 [70/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T09:19:27.7277917Z #47 192.6 [71/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o 2025-09-07T09:19:28.4951019Z #47 193.4 [72/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o 2025-09-07T09:19:28.7636770Z #47 193.7 [73/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o 2025-09-07T09:19:30.3150066Z #47 195.2 [74/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T09:19:30.7441744Z #47 195.7 [75/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o 2025-09-07T09:19:30.8738135Z #47 195.7 [76/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o 2025-09-07T09:19:30.8910965Z #47 195.8 [77/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o 2025-09-07T09:19:32.6892252Z #47 197.6 [78/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T09:19:34.8151801Z #47 199.7 [79/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o 2025-09-07T09:19:35.0554695Z #47 199.8 [80/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o 2025-09-07T09:19:36.1129310Z #47 201.0 [81/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o 2025-09-07T09:19:36.3547126Z #47 201.1 [82/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T09:19:37.9848024Z #47 202.9 [83/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T09:19:39.5354815Z #47 204.4 [84/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o 2025-09-07T09:19:40.7174146Z #47 205.6 [85/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T09:19:41.3851713Z #47 206.3 [86/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T09:19:41.4940008Z #47 206.4 [87/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T09:19:43.6570705Z #47 208.6 [88/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o 2025-09-07T09:19:44.1342790Z #47 209.0 [89/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o 2025-09-07T09:19:48.9737483Z #47 213.9 [90/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T09:19:49.1978840Z #47 214.0 [91/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T09:19:50.0836605Z #47 215.0 [92/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o 2025-09-07T09:19:51.7945985Z #47 216.7 [93/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o 2025-09-07T09:19:52.9486419Z #47 217.9 [94/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T09:19:53.1047783Z #47 218.0 [95/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o 2025-09-07T09:19:54.8150081Z #47 219.7 [96/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o 2025-09-07T09:19:55.1352887Z #47 220.0 [97/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o 2025-09-07T09:19:55.9359306Z #47 220.8 [98/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T09:19:57.4748260Z #47 222.4 [99/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o 2025-09-07T09:19:58.6186200Z #47 223.5 [100/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T09:20:01.6703398Z #47 226.6 [101/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o 2025-09-07T09:20:22.3750886Z #47 247.3 [102/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o 2025-09-07T09:20:23.4990690Z #47 248.4 [103/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o 2025-09-07T09:20:27.5950950Z #47 252.5 [104/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o 2025-09-07T09:20:43.6150839Z #47 268.5 [105/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o 2025-09-07T09:20:45.7373950Z #47 270.6 [106/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o 2025-09-07T09:20:50.6883157Z #47 275.6 [107/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o 2025-09-07T09:20:53.2017442Z #47 278.1 [108/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o 2025-09-07T09:20:53.3684909Z #47 278.1 [109/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o 2025-09-07T09:20:56.4048241Z #47 281.3 [110/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o 2025-09-07T09:20:56.9862727Z #47 281.9 [111/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o 2025-09-07T09:20:57.3823965Z #47 282.3 [112/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o 2025-09-07T09:21:00.0450030Z #47 285.0 [113/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o 2025-09-07T09:21:02.9537987Z #47 287.9 [114/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T09:21:03.3723939Z #47 288.3 [115/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o 2025-09-07T09:21:04.1640454Z #47 289.1 [116/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o 2025-09-07T09:21:04.8792286Z #47 289.8 [117/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o 2025-09-07T09:21:05.3443170Z #47 290.3 [118/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T09:21:05.5244477Z #47 290.3 [119/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o 2025-09-07T09:21:05.7704570Z #47 290.7 [120/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o 2025-09-07T09:21:05.9877938Z #47 290.7 [121/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o 2025-09-07T09:21:06.1046584Z #47 291.0 [122/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o 2025-09-07T09:21:08.0104143Z #47 292.9 [123/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T09:21:08.9648054Z #47 293.9 [124/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o 2025-09-07T09:21:09.1174607Z #47 293.9 [125/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T09:21:09.6844496Z #47 294.6 [126/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T09:21:10.7762783Z #47 295.7 [127/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o 2025-09-07T09:21:12.6683465Z #47 297.6 [128/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o 2025-09-07T09:21:13.2942570Z #47 298.2 [129/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o 2025-09-07T09:21:15.2091740Z #47 300.1 [130/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o 2025-09-07T09:21:15.3804929Z #47 300.1 [131/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T09:21:15.4984091Z #47 300.4 [132/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o 2025-09-07T09:21:15.8344712Z #47 300.7 [133/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T09:21:15.9946891Z #47 300.8 [134/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o 2025-09-07T09:21:17.7547571Z #47 302.7 [135/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o 2025-09-07T09:21:18.4144547Z #47 303.3 [136/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T09:21:21.9234955Z #47 306.8 [137/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T09:21:23.4667148Z #47 308.4 [138/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T09:21:24.1459777Z #47 309.1 [139/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cu -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o 2025-09-07T09:21:26.7153679Z #47 311.6 [140/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T09:21:29.7038693Z #47 314.6 [141/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T09:21:29.8461793Z #47 314.8 [142/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o 2025-09-07T09:21:30.0761738Z #47 314.8 [143/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o 2025-09-07T09:21:32.7334900Z #47 317.6 [144/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T09:21:39.0496767Z #47 324.0 [145/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o 2025-09-07T09:21:40.0440352Z #47 325.0 [146/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T09:21:44.4151793Z #47 329.3 [147/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o 2025-09-07T09:21:53.9336928Z #47 338.8 [148/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T09:21:54.9018511Z #47 339.8 [149/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T09:21:55.2249388Z #47 340.1 [150/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o 2025-09-07T09:22:03.7211254Z #47 348.6 [151/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o 2025-09-07T09:22:08.4404919Z #47 353.4 [152/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o 2025-09-07T09:22:11.5462486Z #47 356.5 [153/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o 2025-09-07T09:22:12.5059555Z #47 357.4 [154/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o 2025-09-07T09:22:14.5877941Z #47 359.5 [155/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o 2025-09-07T09:22:16.0144865Z #47 360.9 [156/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o 2025-09-07T09:22:16.2064994Z #47 361.1 [157/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o 2025-09-07T09:22:18.3847175Z #47 363.3 [158/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o 2025-09-07T09:22:20.0719288Z #47 365.0 [159/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o 2025-09-07T09:22:20.2593449Z #47 365.0 [160/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o 2025-09-07T09:22:20.2632040Z #47 365.2 [161/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o 2025-09-07T09:22:20.4465197Z #47 365.4 [162/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o 2025-09-07T09:22:20.9556528Z #47 365.9 [163/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o 2025-09-07T09:22:21.4161754Z #47 366.3 [164/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o 2025-09-07T09:22:26.8556859Z #47 371.8 [165/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T09:22:27.5436330Z #47 372.5 [166/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T09:22:28.8338410Z #47 373.7 [167/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T09:22:29.0635853Z #47 374.0 [168/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cu -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o 2025-09-07T09:22:29.3536984Z #47 374.3 [169/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T09:22:31.8417256Z #47 376.8 [170/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:22:33.7142873Z #47 378.6 [171/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T09:22:35.5575984Z #47 380.5 [172/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T09:22:36.1094954Z #47 381.0 [173/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T09:22:36.3526682Z #47 381.1 [174/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:22:36.4035272Z #47 381.3 [175/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cu -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o 2025-09-07T09:22:38.1386893Z #47 383.0 [176/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T09:22:38.3061640Z #47 383.1 [177/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:22:38.4110133Z #47 383.3 [178/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:22:38.5548645Z #47 383.5 [179/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:22:38.9711686Z #47 383.9 [180/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:22:39.8158773Z #47 384.7 [181/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:22:39.9844689Z #47 384.9 [182/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:22:41.2620332Z #47 386.2 [183/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:22:41.8757848Z #47 386.8 [184/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:22:42.2318297Z #47 387.1 [185/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:22:43.7461789Z #47 388.7 [186/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cu -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o 2025-09-07T09:22:43.9927915Z #47 388.7 [187/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:22:43.9963527Z #47 388.8 [188/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:22:44.2520613Z #47 389.2 [189/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:22:44.4372564Z #47 389.2 [190/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:22:44.9345068Z #47 389.8 [191/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o 2025-09-07T09:22:44.9363737Z #47 389.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu(100): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T09:22:44.9366585Z #47 389.8 bool use_swa = window_left != -1; 2025-09-07T09:22:44.9367217Z #47 389.8 ^ 2025-09-07T09:22:44.9367638Z #47 389.8 2025-09-07T09:22:44.9368363Z #47 389.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:22:44.9369304Z #47 389.8 2025-09-07T09:22:44.9372026Z #47 389.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu(197): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T09:22:44.9375228Z #47 389.8 bool use_swa = window_left != -1; 2025-09-07T09:22:44.9375907Z #47 389.8 ^ 2025-09-07T09:22:44.9376343Z #47 389.8 2025-09-07T09:22:45.1062448Z #47 390.0 [192/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T09:22:50.1072020Z #47 395.0 [193/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T09:22:52.9456979Z #47 397.9 [194/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:22:55.7809267Z #47 400.7 [195/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o 2025-09-07T09:22:55.9302886Z #47 400.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu(100): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T09:22:55.9306086Z #47 400.7 bool use_swa = window_left != -1; 2025-09-07T09:22:55.9306745Z #47 400.7 ^ 2025-09-07T09:22:55.9307183Z #47 400.7 2025-09-07T09:22:55.9308307Z #47 400.7 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:22:55.9309294Z #47 400.7 2025-09-07T09:22:55.9311964Z #47 400.7 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu(197): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T09:22:55.9315057Z #47 400.7 bool use_swa = window_left != -1; 2025-09-07T09:22:55.9315725Z #47 400.7 ^ 2025-09-07T09:22:55.9316322Z #47 400.7 2025-09-07T09:23:14.1649723Z #47 419.1 [196/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T09:23:14.3143040Z #47 419.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:14.3146467Z #47 419.1 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:14.3147427Z #47 419.1 ^ 2025-09-07T09:23:14.3147925Z #47 419.1 2025-09-07T09:23:14.3148696Z #47 419.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:14.3149715Z #47 419.1 2025-09-07T09:23:14.3152570Z #47 419.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:14.3159314Z #47 419.1 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:14.3160287Z #47 419.1 ^ 2025-09-07T09:23:14.3160768Z #47 419.1 2025-09-07T09:23:14.3161594Z #47 419.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:14.3162596Z #47 419.1 2025-09-07T09:23:14.3165389Z #47 419.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:14.3168881Z #47 419.1 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:14.3169765Z #47 419.1 ^ 2025-09-07T09:23:14.3170254Z #47 419.1 2025-09-07T09:23:14.3171054Z #47 419.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:14.3172085Z #47 419.1 2025-09-07T09:23:14.3175200Z #47 419.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:14.3178479Z #47 419.1 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:14.3179431Z #47 419.1 ^ 2025-09-07T09:23:14.3179937Z #47 419.1 2025-09-07T09:23:14.3180748Z #47 419.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:14.3181754Z #47 419.1 2025-09-07T09:23:14.3184611Z #47 419.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:14.3187926Z #47 419.1 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:14.3188869Z #47 419.1 ^ 2025-09-07T09:23:14.3189408Z #47 419.1 2025-09-07T09:23:14.3190154Z #47 419.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:14.3191018Z #47 419.1 2025-09-07T09:23:14.9551279Z #47 419.9 [197/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T09:23:14.9569886Z #47 419.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:14.9573570Z #47 419.9 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:14.9574524Z #47 419.9 ^ 2025-09-07T09:23:14.9575046Z #47 419.9 2025-09-07T09:23:14.9575817Z #47 419.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:14.9576600Z #47 419.9 2025-09-07T09:23:14.9579317Z #47 419.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:14.9582493Z #47 419.9 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:14.9583308Z #47 419.9 ^ 2025-09-07T09:23:14.9583771Z #47 419.9 2025-09-07T09:23:14.9584423Z #47 419.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:14.9585224Z #47 419.9 2025-09-07T09:23:14.9587964Z #47 419.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:14.9591114Z #47 419.9 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:14.9592403Z #47 419.9 ^ 2025-09-07T09:23:14.9592925Z #47 419.9 2025-09-07T09:23:14.9593719Z #47 419.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:14.9594751Z #47 419.9 2025-09-07T09:23:14.9597704Z #47 419.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:14.9601094Z #47 419.9 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:14.9601949Z #47 419.9 ^ 2025-09-07T09:23:14.9602408Z #47 419.9 2025-09-07T09:23:14.9603171Z #47 419.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:14.9604179Z #47 419.9 2025-09-07T09:23:14.9607035Z #47 419.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:14.9610269Z #47 419.9 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:14.9611135Z #47 419.9 ^ 2025-09-07T09:23:14.9611597Z #47 419.9 2025-09-07T09:23:14.9612347Z #47 419.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:14.9613458Z #47 419.9 2025-09-07T09:23:15.1990244Z #47 420.0 [198/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T09:23:15.2012594Z #47 420.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:15.2016193Z #47 420.0 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:15.2017222Z #47 420.0 ^ 2025-09-07T09:23:15.2017742Z #47 420.0 2025-09-07T09:23:15.2018611Z #47 420.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:15.2019646Z #47 420.0 2025-09-07T09:23:15.2022621Z #47 420.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:15.2026131Z #47 420.0 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:15.2027130Z #47 420.0 ^ 2025-09-07T09:23:15.2027665Z #47 420.0 2025-09-07T09:23:15.2028475Z #47 420.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:15.2029513Z #47 420.0 2025-09-07T09:23:15.2032489Z #47 420.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:15.2036002Z #47 420.0 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:15.2037005Z #47 420.0 ^ 2025-09-07T09:23:15.2037520Z #47 420.0 2025-09-07T09:23:15.2038351Z #47 420.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:15.2039370Z #47 420.0 2025-09-07T09:23:15.2042345Z #47 420.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:15.2045834Z #47 420.0 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:15.2046831Z #47 420.0 ^ 2025-09-07T09:23:15.2047361Z #47 420.0 2025-09-07T09:23:15.2048164Z #47 420.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:15.2049359Z #47 420.0 2025-09-07T09:23:15.2052779Z #47 420.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:15.2056232Z #47 420.0 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:15.2057223Z #47 420.0 ^ 2025-09-07T09:23:15.2057735Z #47 420.0 2025-09-07T09:23:15.2058709Z #47 420.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:15.2059735Z #47 420.0 2025-09-07T09:23:17.9261364Z #47 422.8 [199/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T09:23:18.0805486Z #47 422.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:18.0808265Z #47 422.8 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:18.0809060Z #47 422.8 ^ 2025-09-07T09:23:18.0809466Z #47 422.8 2025-09-07T09:23:18.0810194Z #47 422.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:18.0810972Z #47 422.8 2025-09-07T09:23:18.0813456Z #47 422.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:18.0816241Z #47 422.8 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:18.0816997Z #47 422.8 ^ 2025-09-07T09:23:18.0817404Z #47 422.8 2025-09-07T09:23:18.0817986Z #47 422.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:18.0819093Z #47 422.8 2025-09-07T09:23:18.0821411Z #47 422.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:18.0824667Z #47 422.8 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:18.0825531Z #47 422.8 ^ 2025-09-07T09:23:18.0825998Z #47 422.8 2025-09-07T09:23:18.0826940Z #47 422.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:18.0827777Z #47 422.8 2025-09-07T09:23:18.0830152Z #47 422.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:18.0833157Z #47 422.8 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:18.0834135Z #47 422.8 ^ 2025-09-07T09:23:18.0834589Z #47 422.8 2025-09-07T09:23:18.0835229Z #47 422.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:18.0836023Z #47 422.8 2025-09-07T09:23:18.0838702Z #47 422.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:18.0841578Z #47 422.8 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:18.0842357Z #47 422.8 ^ 2025-09-07T09:23:18.0842776Z #47 422.8 2025-09-07T09:23:18.0843436Z #47 422.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:18.0844298Z #47 422.8 2025-09-07T09:23:18.4844045Z #47 423.4 [200/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cu -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o 2025-09-07T09:23:19.1832869Z #47 424.1 [201/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T09:23:19.3327604Z #47 424.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:19.3331056Z #47 424.1 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:19.3332023Z #47 424.1 ^ 2025-09-07T09:23:19.3332647Z #47 424.1 2025-09-07T09:23:19.3333461Z #47 424.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:19.3334446Z #47 424.1 2025-09-07T09:23:19.3337220Z #47 424.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:19.3340651Z #47 424.1 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:19.3341611Z #47 424.1 ^ 2025-09-07T09:23:19.3342132Z #47 424.1 2025-09-07T09:23:19.3342928Z #47 424.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:19.3343942Z #47 424.1 2025-09-07T09:23:19.3346725Z #47 424.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:19.3350080Z #47 424.1 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:19.3351038Z #47 424.1 ^ 2025-09-07T09:23:19.3351543Z #47 424.1 2025-09-07T09:23:19.3352316Z #47 424.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:19.3353284Z #47 424.1 2025-09-07T09:23:19.3356281Z #47 424.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:19.3359615Z #47 424.1 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:19.3360576Z #47 424.1 ^ 2025-09-07T09:23:19.3361090Z #47 424.1 2025-09-07T09:23:19.3361872Z #47 424.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:19.3362999Z #47 424.1 2025-09-07T09:23:19.3365810Z #47 424.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:19.3369265Z #47 424.1 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:19.3370197Z #47 424.1 ^ 2025-09-07T09:23:19.3370676Z #47 424.1 2025-09-07T09:23:19.3371454Z #47 424.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:19.3372623Z #47 424.1 2025-09-07T09:23:22.6569746Z #47 427.6 [202/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T09:23:22.6588155Z #47 427.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:22.6591009Z #47 427.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:22.6591813Z #47 427.6 ^ 2025-09-07T09:23:22.6592549Z #47 427.6 2025-09-07T09:23:22.6593241Z #47 427.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:22.6594071Z #47 427.6 2025-09-07T09:23:22.6596465Z #47 427.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:22.6599621Z #47 427.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:22.6600469Z #47 427.6 ^ 2025-09-07T09:23:22.6600932Z #47 427.6 2025-09-07T09:23:22.6601634Z #47 427.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:22.6602448Z #47 427.6 2025-09-07T09:23:22.6605012Z #47 427.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:22.6607974Z #47 427.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:22.6608790Z #47 427.6 ^ 2025-09-07T09:23:22.6609235Z #47 427.6 2025-09-07T09:23:22.6609913Z #47 427.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:22.6610786Z #47 427.6 2025-09-07T09:23:22.6613453Z #47 427.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:22.6616317Z #47 427.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:22.6617141Z #47 427.6 ^ 2025-09-07T09:23:22.6617615Z #47 427.6 2025-09-07T09:23:22.6618295Z #47 427.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:22.6619138Z #47 427.6 2025-09-07T09:23:22.6621651Z #47 427.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:22.6627763Z #47 427.6 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:22.6628646Z #47 427.6 ^ 2025-09-07T09:23:22.6629060Z #47 427.6 2025-09-07T09:23:22.6629771Z #47 427.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:22.6630647Z #47 427.6 2025-09-07T09:23:23.4639136Z #47 428.4 [203/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cu -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o 2025-09-07T09:23:24.7897004Z #47 429.7 [204/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T09:23:26.2893343Z #47 431.2 [205/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T09:23:26.2910719Z #47 431.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:26.2913694Z #47 431.2 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:23:26.2914649Z #47 431.2 ^ 2025-09-07T09:23:26.2915052Z #47 431.2 2025-09-07T09:23:26.2915706Z #47 431.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:26.2916495Z #47 431.2 2025-09-07T09:23:26.2919093Z #47 431.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:26.2921952Z #47 431.2 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:23:26.2922772Z #47 431.2 ^ 2025-09-07T09:23:26.2923187Z #47 431.2 2025-09-07T09:23:26.2923859Z #47 431.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:26.2924679Z #47 431.2 2025-09-07T09:23:26.2926948Z #47 431.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:26.2929763Z #47 431.2 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:23:26.2930608Z #47 431.2 ^ 2025-09-07T09:23:26.2931031Z #47 431.2 2025-09-07T09:23:26.2931678Z #47 431.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:26.2932691Z #47 431.2 2025-09-07T09:23:26.2935018Z #47 431.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:26.2937816Z #47 431.2 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:23:26.2938661Z #47 431.2 ^ 2025-09-07T09:23:26.2939068Z #47 431.2 2025-09-07T09:23:26.2939704Z #47 431.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:26.2940479Z #47 431.2 2025-09-07T09:23:26.2942733Z #47 431.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:26.2945444Z #47 431.2 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:23:26.2946291Z #47 431.2 ^ 2025-09-07T09:23:26.2946709Z #47 431.2 2025-09-07T09:23:26.2947339Z #47 431.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:26.2948132Z #47 431.2 2025-09-07T09:23:27.5997038Z #47 432.5 [206/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o 2025-09-07T09:23:27.6018507Z #47 432.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu(100): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T09:23:27.6021619Z #47 432.5 bool use_swa = window_left != -1; 2025-09-07T09:23:27.6022313Z #47 432.5 ^ 2025-09-07T09:23:27.6022753Z #47 432.5 2025-09-07T09:23:27.6023594Z #47 432.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:27.6024587Z #47 432.5 2025-09-07T09:23:27.6027441Z #47 432.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu(197): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T09:23:27.6030626Z #47 432.5 bool use_swa = window_left != -1; 2025-09-07T09:23:27.6031299Z #47 432.5 ^ 2025-09-07T09:23:27.6031758Z #47 432.5 2025-09-07T09:23:38.0151640Z #47 442.9 [207/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o 2025-09-07T09:23:38.0170793Z #47 442.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu(100): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T09:23:38.0173947Z #47 442.9 bool use_swa = window_left != -1; 2025-09-07T09:23:38.0174548Z #47 442.9 ^ 2025-09-07T09:23:38.0174964Z #47 442.9 2025-09-07T09:23:38.0175670Z #47 442.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:38.0176898Z #47 442.9 2025-09-07T09:23:38.0179704Z #47 442.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu(197): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T09:23:38.0182606Z #47 442.9 bool use_swa = window_left != -1; 2025-09-07T09:23:38.0183238Z #47 442.9 ^ 2025-09-07T09:23:38.0183673Z #47 442.9 2025-09-07T09:23:39.7958386Z #47 444.7 [208/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T09:23:42.4551587Z #47 447.4 [209/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T09:23:42.4571647Z #47 447.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:42.4575083Z #47 447.4 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:23:42.4576113Z #47 447.4 ^ 2025-09-07T09:23:42.4576609Z #47 447.4 2025-09-07T09:23:42.4577392Z #47 447.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:42.4578378Z #47 447.4 2025-09-07T09:23:42.4581127Z #47 447.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:42.4584569Z #47 447.4 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:23:42.4585502Z #47 447.4 ^ 2025-09-07T09:23:42.4585999Z #47 447.4 2025-09-07T09:23:42.4586722Z #47 447.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:42.4587651Z #47 447.4 2025-09-07T09:23:42.4590309Z #47 447.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:42.4593897Z #47 447.4 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:23:42.4594894Z #47 447.4 ^ 2025-09-07T09:23:42.4595386Z #47 447.4 2025-09-07T09:23:42.4596130Z #47 447.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:42.4597036Z #47 447.4 2025-09-07T09:23:42.4599666Z #47 447.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:42.4603018Z #47 447.4 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:23:42.4603985Z #47 447.4 ^ 2025-09-07T09:23:42.4604459Z #47 447.4 2025-09-07T09:23:42.4605214Z #47 447.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:42.4606110Z #47 447.4 2025-09-07T09:23:42.4608955Z #47 447.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:42.4612084Z #47 447.4 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:23:42.4613197Z #47 447.4 ^ 2025-09-07T09:23:42.4613665Z #47 447.4 2025-09-07T09:23:42.4614365Z #47 447.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:42.4615285Z #47 447.4 2025-09-07T09:23:49.6549055Z #47 454.6 [210/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:23:49.9350888Z #47 454.8 [211/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:23:50.3858566Z #47 455.3 [212/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T09:23:50.3878163Z #47 455.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:50.3881607Z #47 455.3 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:50.3882492Z #47 455.3 ^ 2025-09-07T09:23:50.3882993Z #47 455.3 2025-09-07T09:23:50.3883779Z #47 455.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:50.3884738Z #47 455.3 2025-09-07T09:23:50.3887343Z #47 455.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:50.3890841Z #47 455.3 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:50.3891766Z #47 455.3 ^ 2025-09-07T09:23:50.3892646Z #47 455.3 2025-09-07T09:23:50.3893373Z #47 455.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:50.3894268Z #47 455.3 2025-09-07T09:23:50.3896824Z #47 455.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:50.3900066Z #47 455.3 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:50.3901168Z #47 455.3 ^ 2025-09-07T09:23:50.3901643Z #47 455.3 2025-09-07T09:23:50.3902380Z #47 455.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:50.3903332Z #47 455.3 2025-09-07T09:23:50.3906084Z #47 455.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:50.3909308Z #47 455.3 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:50.3910246Z #47 455.3 ^ 2025-09-07T09:23:50.3910775Z #47 455.3 2025-09-07T09:23:50.3911575Z #47 455.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:50.3912589Z #47 455.3 2025-09-07T09:23:50.3915284Z #47 455.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:50.3918331Z #47 455.3 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:50.3919227Z #47 455.3 ^ 2025-09-07T09:23:50.3919692Z #47 455.3 2025-09-07T09:23:50.3920459Z #47 455.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:50.3921403Z #47 455.3 2025-09-07T09:23:51.0611685Z #47 456.0 [213/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T09:23:51.0633340Z #47 456.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:51.0636734Z #47 456.0 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:51.0637789Z #47 456.0 ^ 2025-09-07T09:23:51.0638201Z #47 456.0 2025-09-07T09:23:51.0638945Z #47 456.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:51.0639800Z #47 456.0 2025-09-07T09:23:51.0642507Z #47 456.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:51.0645731Z #47 456.0 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:51.0646655Z #47 456.0 ^ 2025-09-07T09:23:51.0647124Z #47 456.0 2025-09-07T09:23:51.0647843Z #47 456.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:51.0648740Z #47 456.0 2025-09-07T09:23:51.0651352Z #47 456.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:51.0654654Z #47 456.0 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:51.0655531Z #47 456.0 ^ 2025-09-07T09:23:51.0655970Z #47 456.0 2025-09-07T09:23:51.0656729Z #47 456.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:51.0657662Z #47 456.0 2025-09-07T09:23:51.0660265Z #47 456.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:51.0663355Z #47 456.0 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:51.0664238Z #47 456.0 ^ 2025-09-07T09:23:51.0664742Z #47 456.0 2025-09-07T09:23:51.0665537Z #47 456.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:51.0666464Z #47 456.0 2025-09-07T09:23:51.0669049Z #47 456.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:51.0672161Z #47 456.0 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:51.0673083Z #47 456.0 ^ 2025-09-07T09:23:51.0673557Z #47 456.0 2025-09-07T09:23:51.0674321Z #47 456.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:51.0675281Z #47 456.0 2025-09-07T09:23:51.1649759Z #47 456.0 [214/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T09:23:51.1669075Z #47 456.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:51.1672116Z #47 456.0 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:51.1672959Z #47 456.0 ^ 2025-09-07T09:23:51.1673452Z #47 456.0 2025-09-07T09:23:51.1674168Z #47 456.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:51.1675086Z #47 456.0 2025-09-07T09:23:51.1677778Z #47 456.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:51.1680692Z #47 456.0 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:51.1681556Z #47 456.0 ^ 2025-09-07T09:23:51.1682006Z #47 456.0 2025-09-07T09:23:51.1682761Z #47 456.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:51.1683556Z #47 456.0 2025-09-07T09:23:51.1686118Z #47 456.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:51.1689170Z #47 456.0 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:51.1689968Z #47 456.0 ^ 2025-09-07T09:23:51.1690426Z #47 456.0 2025-09-07T09:23:51.1691176Z #47 456.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:51.1692475Z #47 456.0 2025-09-07T09:23:51.1695084Z #47 456.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:51.1698422Z #47 456.0 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:51.1699433Z #47 456.0 ^ 2025-09-07T09:23:51.1699907Z #47 456.0 2025-09-07T09:23:51.1700650Z #47 456.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:51.1701551Z #47 456.0 2025-09-07T09:23:51.1704518Z #47 456.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:51.1707721Z #47 456.0 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:23:51.1708577Z #47 456.0 ^ 2025-09-07T09:23:51.1709073Z #47 456.0 2025-09-07T09:23:51.1709847Z #47 456.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:51.1710748Z #47 456.0 2025-09-07T09:23:51.1727439Z #47 456.1 [215/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T09:23:53.5596419Z #47 458.5 [216/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:23:55.1671683Z #47 460.1 [217/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T09:23:55.1690273Z #47 460.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:55.1693624Z #47 460.1 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:55.1694622Z #47 460.1 ^ 2025-09-07T09:23:55.1695063Z #47 460.1 2025-09-07T09:23:55.1695767Z #47 460.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:55.1696621Z #47 460.1 2025-09-07T09:23:55.1699122Z #47 460.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:55.1702375Z #47 460.1 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:55.1703226Z #47 460.1 ^ 2025-09-07T09:23:55.1703684Z #47 460.1 2025-09-07T09:23:55.1704391Z #47 460.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:55.1705407Z #47 460.1 2025-09-07T09:23:55.1708059Z #47 460.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:55.1711105Z #47 460.1 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:55.1712036Z #47 460.1 ^ 2025-09-07T09:23:55.1712475Z #47 460.1 2025-09-07T09:23:55.1713183Z #47 460.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:55.1714031Z #47 460.1 2025-09-07T09:23:55.1716449Z #47 460.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:55.1719613Z #47 460.1 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:55.1720487Z #47 460.1 ^ 2025-09-07T09:23:55.1720952Z #47 460.1 2025-09-07T09:23:55.1721680Z #47 460.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:55.1722540Z #47 460.1 2025-09-07T09:23:55.1724814Z #47 460.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:55.1727578Z #47 460.1 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:55.1728388Z #47 460.1 ^ 2025-09-07T09:23:55.1728781Z #47 460.1 2025-09-07T09:23:55.1729424Z #47 460.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:55.1730211Z #47 460.1 2025-09-07T09:23:55.3740094Z #47 460.1 [218/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T09:23:55.3761716Z #47 460.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:55.3765101Z #47 460.1 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:55.3766202Z #47 460.1 ^ 2025-09-07T09:23:55.3766726Z #47 460.1 2025-09-07T09:23:55.3767540Z #47 460.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:55.3768581Z #47 460.1 2025-09-07T09:23:55.3771484Z #47 460.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:55.3775225Z #47 460.1 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:55.3776203Z #47 460.1 ^ 2025-09-07T09:23:55.3776695Z #47 460.1 2025-09-07T09:23:55.3777516Z #47 460.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:55.3778526Z #47 460.1 2025-09-07T09:23:55.3781380Z #47 460.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:55.3784890Z #47 460.1 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:55.3785888Z #47 460.1 ^ 2025-09-07T09:23:55.3786391Z #47 460.1 2025-09-07T09:23:55.3786988Z #47 460.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:55.3787705Z #47 460.1 2025-09-07T09:23:55.3790404Z #47 460.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:55.3793677Z #47 460.1 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:55.3794459Z #47 460.1 ^ 2025-09-07T09:23:55.3794867Z #47 460.1 2025-09-07T09:23:55.3795548Z #47 460.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:55.3796391Z #47 460.1 2025-09-07T09:23:55.3799077Z #47 460.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:55.3802473Z #47 460.1 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:23:55.3803390Z #47 460.1 ^ 2025-09-07T09:23:55.3803888Z #47 460.1 2025-09-07T09:23:55.3804623Z #47 460.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:55.3805564Z #47 460.1 2025-09-07T09:23:56.0472273Z #47 461.0 [219/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:23:57.5038080Z #47 462.4 [220/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T09:23:57.5056611Z #47 462.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:57.5059985Z #47 462.4 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:57.5060899Z #47 462.4 ^ 2025-09-07T09:23:57.5061340Z #47 462.4 2025-09-07T09:23:57.5062056Z #47 462.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:57.5063004Z #47 462.4 2025-09-07T09:23:57.5065755Z #47 462.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:57.5068659Z #47 462.4 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:57.5069644Z #47 462.4 ^ 2025-09-07T09:23:57.5070095Z #47 462.4 2025-09-07T09:23:57.5070852Z #47 462.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:57.5071802Z #47 462.4 2025-09-07T09:23:57.5074359Z #47 462.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:57.5077314Z #47 462.4 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:57.5078123Z #47 462.4 ^ 2025-09-07T09:23:57.5078569Z #47 462.4 2025-09-07T09:23:57.5079278Z #47 462.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:57.5080150Z #47 462.4 2025-09-07T09:23:57.5082378Z #47 462.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:57.5085234Z #47 462.4 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:57.5086049Z #47 462.4 ^ 2025-09-07T09:23:57.5086488Z #47 462.4 2025-09-07T09:23:57.5087132Z #47 462.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:57.5087935Z #47 462.4 2025-09-07T09:23:57.5090432Z #47 462.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:57.5093981Z #47 462.4 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:23:57.5094910Z #47 462.4 ^ 2025-09-07T09:23:57.5095376Z #47 462.4 2025-09-07T09:23:57.5096013Z #47 462.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:57.5096930Z #47 462.4 2025-09-07T09:23:57.8847846Z #47 462.8 [221/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:23:58.1259134Z #47 463.0 [222/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T09:23:58.2750210Z #47 463.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:58.2753385Z #47 463.0 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:23:58.2754208Z #47 463.0 ^ 2025-09-07T09:23:58.2754594Z #47 463.0 2025-09-07T09:23:58.2755202Z #47 463.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:58.2755964Z #47 463.0 2025-09-07T09:23:58.2758113Z #47 463.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:58.2763517Z #47 463.0 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:23:58.2764277Z #47 463.0 ^ 2025-09-07T09:23:58.2764659Z #47 463.0 2025-09-07T09:23:58.2765431Z #47 463.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:58.2766292Z #47 463.0 2025-09-07T09:23:58.2769034Z #47 463.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:58.2771885Z #47 463.0 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:23:58.2773206Z #47 463.0 ^ 2025-09-07T09:23:58.2773695Z #47 463.0 2025-09-07T09:23:58.2774463Z #47 463.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:58.2775465Z #47 463.0 2025-09-07T09:23:58.2777922Z #47 463.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:58.2781040Z #47 463.0 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:23:58.2781854Z #47 463.0 ^ 2025-09-07T09:23:58.2782333Z #47 463.0 2025-09-07T09:23:58.2783027Z #47 463.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:58.2783769Z #47 463.0 2025-09-07T09:23:58.2785920Z #47 463.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:23:58.2788649Z #47 463.0 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:23:58.2789491Z #47 463.0 ^ 2025-09-07T09:23:58.2789928Z #47 463.0 2025-09-07T09:23:58.2790587Z #47 463.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:23:58.2791403Z #47 463.0 2025-09-07T09:23:58.2947209Z #47 463.2 [223/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:24:01.3830790Z #47 466.3 [224/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cuda.o 2025-09-07T09:24:01.3849756Z #47 466.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cu(115): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T09:24:01.3852639Z #47 466.3 bool use_swa = window_left != -1; 2025-09-07T09:24:01.3853309Z #47 466.3 ^ 2025-09-07T09:24:01.3853735Z #47 466.3 2025-09-07T09:24:01.3854482Z #47 466.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:01.3855365Z #47 466.3 2025-09-07T09:24:02.4151647Z #47 467.3 [225/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cuda.o 2025-09-07T09:24:02.5643475Z #47 467.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cu(115): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T09:24:02.5646804Z #47 467.3 bool use_swa = window_left != -1; 2025-09-07T09:24:02.5647526Z #47 467.3 ^ 2025-09-07T09:24:02.5647950Z #47 467.3 2025-09-07T09:24:02.5648923Z #47 467.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:02.5649835Z #47 467.3 2025-09-07T09:24:03.6992156Z #47 468.6 [226/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T09:24:03.8787015Z #47 468.6 [227/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_jit_pybind.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_jit_pybind.cuda.o 2025-09-07T09:24:04.6877877Z #47 469.6 [228/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cuda.o 2025-09-07T09:24:04.8382838Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.8407063Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.8429361Z #47 469.6 ^ 2025-09-07T09:24:04.8429921Z #47 469.6 2025-09-07T09:24:04.8430684Z #47 469.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:04.8431668Z #47 469.6 2025-09-07T09:24:04.8434163Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.8457811Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.8479615Z #47 469.6 ^ 2025-09-07T09:24:04.8480118Z #47 469.6 2025-09-07T09:24:04.8482803Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.8606797Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.8629108Z #47 469.6 ^ 2025-09-07T09:24:04.8629650Z #47 469.6 2025-09-07T09:24:04.8632130Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.8656125Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.8678012Z #47 469.6 ^ 2025-09-07T09:24:04.8678555Z #47 469.6 2025-09-07T09:24:04.8681058Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.8704923Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.8727430Z #47 469.6 ^ 2025-09-07T09:24:04.8727959Z #47 469.6 2025-09-07T09:24:04.8730464Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.8754915Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.8777409Z #47 469.6 ^ 2025-09-07T09:24:04.8777945Z #47 469.6 2025-09-07T09:24:04.8780585Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.8804851Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.8827426Z #47 469.6 ^ 2025-09-07T09:24:04.8827952Z #47 469.6 2025-09-07T09:24:04.8830642Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.8855158Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.8877606Z #47 469.6 ^ 2025-09-07T09:24:04.8878132Z #47 469.6 2025-09-07T09:24:04.8880647Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.8903638Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.8924784Z #47 469.6 ^ 2025-09-07T09:24:04.8925292Z #47 469.6 2025-09-07T09:24:04.8926109Z #47 469.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:04.8927000Z #47 469.6 2025-09-07T09:24:04.8929349Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.8951183Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.8972045Z #47 469.6 ^ 2025-09-07T09:24:04.8972756Z #47 469.6 2025-09-07T09:24:04.8975341Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.8998010Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9020002Z #47 469.6 ^ 2025-09-07T09:24:04.9020562Z #47 469.6 2025-09-07T09:24:04.9023275Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9047195Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9083622Z #47 469.6 ^ 2025-09-07T09:24:04.9084176Z #47 469.6 2025-09-07T09:24:04.9086831Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9111587Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9134572Z #47 469.6 ^ 2025-09-07T09:24:04.9135110Z #47 469.6 2025-09-07T09:24:04.9137559Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9160949Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9182757Z #47 469.6 ^ 2025-09-07T09:24:04.9183314Z #47 469.6 2025-09-07T09:24:04.9185622Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9208239Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9229935Z #47 469.6 ^ 2025-09-07T09:24:04.9230466Z #47 469.6 2025-09-07T09:24:04.9232866Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9257034Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9278285Z #47 469.6 ^ 2025-09-07T09:24:04.9278818Z #47 469.6 2025-09-07T09:24:04.9281253Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9306738Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9327944Z #47 469.6 ^ 2025-09-07T09:24:04.9328482Z #47 469.6 2025-09-07T09:24:04.9329200Z #47 469.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:04.9330123Z #47 469.6 2025-09-07T09:24:04.9332592Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9355234Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9377226Z #47 469.6 ^ 2025-09-07T09:24:04.9377774Z #47 469.6 2025-09-07T09:24:04.9380170Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9403373Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9424819Z #47 469.6 ^ 2025-09-07T09:24:04.9425341Z #47 469.6 2025-09-07T09:24:04.9427756Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9451003Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9472547Z #47 469.6 ^ 2025-09-07T09:24:04.9473115Z #47 469.6 2025-09-07T09:24:04.9475524Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9499692Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9521392Z #47 469.6 ^ 2025-09-07T09:24:04.9521930Z #47 469.6 2025-09-07T09:24:04.9524321Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9548656Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9570213Z #47 469.6 ^ 2025-09-07T09:24:04.9570764Z #47 469.6 2025-09-07T09:24:04.9573429Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9597000Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9618233Z #47 469.6 ^ 2025-09-07T09:24:04.9618753Z #47 469.6 2025-09-07T09:24:04.9621035Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9644815Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9667152Z #47 469.6 ^ 2025-09-07T09:24:04.9667662Z #47 469.6 2025-09-07T09:24:04.9670182Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9693757Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9715210Z #47 469.6 ^ 2025-09-07T09:24:04.9715765Z #47 469.6 2025-09-07T09:24:04.9716537Z #47 469.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:04.9717490Z #47 469.6 2025-09-07T09:24:04.9719947Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9742652Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9763939Z #47 469.6 ^ 2025-09-07T09:24:04.9764466Z #47 469.6 2025-09-07T09:24:04.9766947Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9790097Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9811264Z #47 469.6 ^ 2025-09-07T09:24:04.9811773Z #47 469.6 2025-09-07T09:24:04.9814553Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9837890Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9858689Z #47 469.6 ^ 2025-09-07T09:24:04.9859222Z #47 469.6 2025-09-07T09:24:04.9861665Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9885389Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9908167Z #47 469.6 ^ 2025-09-07T09:24:04.9908676Z #47 469.6 2025-09-07T09:24:04.9911091Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9934413Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:04.9956142Z #47 469.6 ^ 2025-09-07T09:24:04.9956658Z #47 469.6 2025-09-07T09:24:04.9959124Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:04.9983342Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:05.0004721Z #47 469.6 ^ 2025-09-07T09:24:05.0005262Z #47 469.6 2025-09-07T09:24:05.0007954Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:05.0031006Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:05.0053036Z #47 469.6 ^ 2025-09-07T09:24:05.0053522Z #47 469.6 2025-09-07T09:24:05.0056221Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:05.0079064Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:05.0100707Z #47 469.6 ^ 2025-09-07T09:24:05.0101227Z #47 469.6 2025-09-07T09:24:05.0102016Z #47 469.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:05.0102891Z #47 469.6 2025-09-07T09:24:05.0105202Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:05.0128405Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:05.0149986Z #47 469.6 ^ 2025-09-07T09:24:05.0150486Z #47 469.6 2025-09-07T09:24:05.0152842Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:05.0176024Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:05.0197118Z #47 469.6 ^ 2025-09-07T09:24:05.0197583Z #47 469.6 2025-09-07T09:24:05.0200007Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:05.0222732Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:05.0244454Z #47 469.6 ^ 2025-09-07T09:24:05.0244961Z #47 469.6 2025-09-07T09:24:05.0247634Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:05.0270654Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:05.0292260Z #47 469.6 ^ 2025-09-07T09:24:05.0292889Z #47 469.6 2025-09-07T09:24:05.0295187Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:05.0318792Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:05.0340899Z #47 469.6 ^ 2025-09-07T09:24:05.0341414Z #47 469.6 2025-09-07T09:24:05.0343783Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:05.0367261Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:05.0389066Z #47 469.6 ^ 2025-09-07T09:24:05.0389564Z #47 469.6 2025-09-07T09:24:05.0392228Z #47 469.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:05.0414945Z #47 469.6 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:05.0436805Z #47 469.6 ^ 2025-09-07T09:24:05.0437538Z #47 469.6 2025-09-07T09:24:05.0455023Z #47 469.8 [229/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:24:08.9985464Z #47 473.9 [230/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:24:11.4247122Z #47 476.3 [231/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:24:12.3602431Z #47 477.3 [232/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:24:14.1837504Z #47 479.1 [233/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T09:24:14.1859186Z #47 479.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:14.1862770Z #47 479.1 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:24:14.1863878Z #47 479.1 ^ 2025-09-07T09:24:14.1864427Z #47 479.1 2025-09-07T09:24:14.1865135Z #47 479.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:14.1866111Z #47 479.1 2025-09-07T09:24:14.1868865Z #47 479.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:14.1872334Z #47 479.1 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:24:14.1873449Z #47 479.1 ^ 2025-09-07T09:24:14.1873901Z #47 479.1 2025-09-07T09:24:14.1874639Z #47 479.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:14.1875709Z #47 479.1 2025-09-07T09:24:14.1878461Z #47 479.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:14.1882194Z #47 479.1 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:24:14.1883166Z #47 479.1 ^ 2025-09-07T09:24:14.1883626Z #47 479.1 2025-09-07T09:24:14.1884474Z #47 479.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:14.1885506Z #47 479.1 2025-09-07T09:24:14.1888463Z #47 479.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:14.1893141Z #47 479.1 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:24:14.1894245Z #47 479.1 ^ 2025-09-07T09:24:14.1894757Z #47 479.1 2025-09-07T09:24:14.1895584Z #47 479.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:14.1896625Z #47 479.1 2025-09-07T09:24:14.1899688Z #47 479.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:14.1903145Z #47 479.1 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:24:14.1904243Z #47 479.1 ^ 2025-09-07T09:24:14.1904770Z #47 479.1 2025-09-07T09:24:14.1905580Z #47 479.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:14.1906618Z #47 479.1 2025-09-07T09:24:15.3438315Z #47 480.3 [234/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cuda.o 2025-09-07T09:24:15.3457823Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.3482343Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.3505069Z #47 480.3 ^ 2025-09-07T09:24:15.3505622Z #47 480.3 2025-09-07T09:24:15.3506448Z #47 480.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:15.3507472Z #47 480.3 2025-09-07T09:24:15.3510093Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.3532663Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.3552872Z #47 480.3 ^ 2025-09-07T09:24:15.3553366Z #47 480.3 2025-09-07T09:24:15.3555656Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.3577330Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.3597498Z #47 480.3 ^ 2025-09-07T09:24:15.3597999Z #47 480.3 2025-09-07T09:24:15.3600286Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.3621821Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.3641530Z #47 480.3 ^ 2025-09-07T09:24:15.3642031Z #47 480.3 2025-09-07T09:24:15.3644310Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.3667537Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.3688878Z #47 480.3 ^ 2025-09-07T09:24:15.3689365Z #47 480.3 2025-09-07T09:24:15.3691688Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.3788380Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.3809995Z #47 480.3 ^ 2025-09-07T09:24:15.3810524Z #47 480.3 2025-09-07T09:24:15.3813177Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.3835987Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.3857349Z #47 480.3 ^ 2025-09-07T09:24:15.3857843Z #47 480.3 2025-09-07T09:24:15.3860225Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.3883002Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.3905258Z #47 480.3 ^ 2025-09-07T09:24:15.3905766Z #47 480.3 2025-09-07T09:24:15.3908297Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.3930476Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.3951917Z #47 480.3 ^ 2025-09-07T09:24:15.3952412Z #47 480.3 2025-09-07T09:24:15.3953142Z #47 480.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:15.3954056Z #47 480.3 2025-09-07T09:24:15.3956397Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.3979145Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4000497Z #47 480.3 ^ 2025-09-07T09:24:15.4000998Z #47 480.3 2025-09-07T09:24:15.4003370Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4025700Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4046464Z #47 480.3 ^ 2025-09-07T09:24:15.4046973Z #47 480.3 2025-09-07T09:24:15.4049544Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4071637Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4093273Z #47 480.3 ^ 2025-09-07T09:24:15.4093799Z #47 480.3 2025-09-07T09:24:15.4096185Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4119255Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4139914Z #47 480.3 ^ 2025-09-07T09:24:15.4140440Z #47 480.3 2025-09-07T09:24:15.4142933Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4165872Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4187542Z #47 480.3 ^ 2025-09-07T09:24:15.4188020Z #47 480.3 2025-09-07T09:24:15.4190364Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4213345Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4233922Z #47 480.3 ^ 2025-09-07T09:24:15.4235464Z #47 480.3 2025-09-07T09:24:15.4237819Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4260127Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4280677Z #47 480.3 ^ 2025-09-07T09:24:15.4281196Z #47 480.3 2025-09-07T09:24:15.4283746Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4307052Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4328700Z #47 480.3 ^ 2025-09-07T09:24:15.4329245Z #47 480.3 2025-09-07T09:24:15.4330050Z #47 480.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:15.4330992Z #47 480.3 2025-09-07T09:24:15.4333567Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4354758Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4373330Z #47 480.3 ^ 2025-09-07T09:24:15.4373819Z #47 480.3 2025-09-07T09:24:15.4375840Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4396088Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4414781Z #47 480.3 ^ 2025-09-07T09:24:15.4415265Z #47 480.3 2025-09-07T09:24:15.4417124Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4435580Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4454251Z #47 480.3 ^ 2025-09-07T09:24:15.4454711Z #47 480.3 2025-09-07T09:24:15.4456996Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4476648Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4495473Z #47 480.3 ^ 2025-09-07T09:24:15.4495950Z #47 480.3 2025-09-07T09:24:15.4498214Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4520987Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4542332Z #47 480.3 ^ 2025-09-07T09:24:15.4542825Z #47 480.3 2025-09-07T09:24:15.4545169Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4566750Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4587794Z #47 480.3 ^ 2025-09-07T09:24:15.4588347Z #47 480.3 2025-09-07T09:24:15.4590647Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4613424Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4633916Z #47 480.3 ^ 2025-09-07T09:24:15.4634400Z #47 480.3 2025-09-07T09:24:15.4636695Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4658716Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4678667Z #47 480.3 ^ 2025-09-07T09:24:15.4679157Z #47 480.3 2025-09-07T09:24:15.4679883Z #47 480.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:15.4680982Z #47 480.3 2025-09-07T09:24:15.4683209Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4704280Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4723916Z #47 480.3 ^ 2025-09-07T09:24:15.4724466Z #47 480.3 2025-09-07T09:24:15.4726804Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4750311Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4771371Z #47 480.3 ^ 2025-09-07T09:24:15.4771899Z #47 480.3 2025-09-07T09:24:15.4774468Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4798113Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4820844Z #47 480.3 ^ 2025-09-07T09:24:15.4821395Z #47 480.3 2025-09-07T09:24:15.4823968Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4848334Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4871617Z #47 480.3 ^ 2025-09-07T09:24:15.4872327Z #47 480.3 2025-09-07T09:24:15.4875083Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4899585Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4920730Z #47 480.3 ^ 2025-09-07T09:24:15.4921225Z #47 480.3 2025-09-07T09:24:15.4923365Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4946472Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.4967316Z #47 480.3 ^ 2025-09-07T09:24:15.4967794Z #47 480.3 2025-09-07T09:24:15.4969919Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.4992825Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.5013888Z #47 480.3 ^ 2025-09-07T09:24:15.5014404Z #47 480.3 2025-09-07T09:24:15.5016999Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.5039303Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.5059632Z #47 480.3 ^ 2025-09-07T09:24:15.5060110Z #47 480.3 2025-09-07T09:24:15.5060772Z #47 480.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:15.5061676Z #47 480.3 2025-09-07T09:24:15.5064345Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.5085852Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.5105958Z #47 480.3 ^ 2025-09-07T09:24:15.5106465Z #47 480.3 2025-09-07T09:24:15.5108784Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.5130574Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.5150804Z #47 480.3 ^ 2025-09-07T09:24:15.5151309Z #47 480.3 2025-09-07T09:24:15.5153636Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.5175599Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.5196152Z #47 480.3 ^ 2025-09-07T09:24:15.5196718Z #47 480.3 2025-09-07T09:24:15.5199428Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.5224844Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.5246143Z #47 480.3 ^ 2025-09-07T09:24:15.5246691Z #47 480.3 2025-09-07T09:24:15.5249441Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.5274601Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.5298265Z #47 480.3 ^ 2025-09-07T09:24:15.5298813Z #47 480.3 2025-09-07T09:24:15.5301386Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.5325702Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.5348634Z #47 480.3 ^ 2025-09-07T09:24:15.5349180Z #47 480.3 2025-09-07T09:24:15.5351860Z #47 480.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:24:15.5377603Z #47 480.3 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:24:15.5402154Z #47 480.3 ^ 2025-09-07T09:24:15.5402638Z #47 480.3 2025-09-07T09:24:17.9849033Z #47 482.9 [235/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:24:18.8097270Z #47 483.7 [236/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:24:24.0494004Z #47 489.0 [237/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:24:24.9446743Z #47 489.9 [238/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:24:28.0519222Z #47 493.0 [239/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_jit_pybind.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_jit_pybind.cuda.o 2025-09-07T09:24:34.1189536Z #47 499.0 [240/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:24:34.7267643Z #47 499.6 [241/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T09:24:34.7285919Z #47 499.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:34.7288957Z #47 499.6 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:24:34.7289836Z #47 499.6 ^ 2025-09-07T09:24:34.7290299Z #47 499.6 2025-09-07T09:24:34.7291008Z #47 499.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:34.7292192Z #47 499.6 2025-09-07T09:24:34.7294827Z #47 499.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:34.7298116Z #47 499.6 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:24:34.7298918Z #47 499.6 ^ 2025-09-07T09:24:34.7299372Z #47 499.6 2025-09-07T09:24:34.7300055Z #47 499.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:34.7300915Z #47 499.6 2025-09-07T09:24:34.7303605Z #47 499.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:34.7306629Z #47 499.6 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:24:34.7307578Z #47 499.6 ^ 2025-09-07T09:24:34.7308032Z #47 499.6 2025-09-07T09:24:34.7308744Z #47 499.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:34.7309644Z #47 499.6 2025-09-07T09:24:34.7312194Z #47 499.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:34.7315365Z #47 499.6 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:24:34.7316192Z #47 499.6 ^ 2025-09-07T09:24:34.7316663Z #47 499.6 2025-09-07T09:24:34.7317373Z #47 499.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:34.7318308Z #47 499.6 2025-09-07T09:24:34.7320832Z #47 499.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:34.7323799Z #47 499.6 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:24:34.7324672Z #47 499.6 ^ 2025-09-07T09:24:34.7325133Z #47 499.6 2025-09-07T09:24:34.7325883Z #47 499.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:34.7326807Z #47 499.6 2025-09-07T09:24:37.8450713Z #47 502.8 [242/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T09:24:37.8469368Z #47 502.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:37.8494513Z #47 502.8 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:24:37.8594673Z #47 502.8 ^ 2025-09-07T09:24:37.8595328Z #47 502.8 2025-09-07T09:24:37.8596121Z #47 502.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:37.8597186Z #47 502.8 2025-09-07T09:24:37.8600211Z #47 502.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:37.8604244Z #47 502.8 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:24:37.8605233Z #47 502.8 ^ 2025-09-07T09:24:37.8605710Z #47 502.8 2025-09-07T09:24:37.8606550Z #47 502.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:37.8607556Z #47 502.8 2025-09-07T09:24:37.8610319Z #47 502.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:37.8613882Z #47 502.8 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:24:37.8614828Z #47 502.8 ^ 2025-09-07T09:24:37.8615304Z #47 502.8 2025-09-07T09:24:37.8616117Z #47 502.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:37.8617126Z #47 502.8 2025-09-07T09:24:37.8620043Z #47 502.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:37.8623469Z #47 502.8 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:24:37.8624446Z #47 502.8 ^ 2025-09-07T09:24:37.8624968Z #47 502.8 2025-09-07T09:24:37.8625791Z #47 502.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:37.8626800Z #47 502.8 2025-09-07T09:24:37.8629590Z #47 502.8 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:37.8633002Z #47 502.8 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:24:37.8633987Z #47 502.8 ^ 2025-09-07T09:24:37.8634499Z #47 502.8 2025-09-07T09:24:37.8635327Z #47 502.8 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:37.8636328Z #47 502.8 2025-09-07T09:24:38.7612371Z #47 503.7 [243/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:24:38.9684327Z #47 503.9 [244/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:24:40.1716249Z #47 505.1 [245/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:24:42.7997096Z #47 507.7 [246/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:24:43.0153522Z #47 507.8 [247/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:24:45.0944608Z #47 510.0 [248/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T09:24:45.0965515Z #47 510.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:45.0969120Z #47 510.0 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:24:45.0970140Z #47 510.0 ^ 2025-09-07T09:24:45.0970660Z #47 510.0 2025-09-07T09:24:45.0971471Z #47 510.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:45.0972989Z #47 510.0 2025-09-07T09:24:45.0975855Z #47 510.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:45.0979196Z #47 510.0 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:24:45.0980135Z #47 510.0 ^ 2025-09-07T09:24:45.0980674Z #47 510.0 2025-09-07T09:24:45.0981692Z #47 510.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:45.0982612Z #47 510.0 2025-09-07T09:24:45.0985791Z #47 510.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:45.0989254Z #47 510.0 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:24:45.0990261Z #47 510.0 ^ 2025-09-07T09:24:45.0990736Z #47 510.0 2025-09-07T09:24:45.0991672Z #47 510.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:45.0993038Z #47 510.0 2025-09-07T09:24:45.0995817Z #47 510.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:45.0999159Z #47 510.0 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:24:45.1000124Z #47 510.0 ^ 2025-09-07T09:24:45.1000621Z #47 510.0 2025-09-07T09:24:45.1001433Z #47 510.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:45.1002436Z #47 510.0 2025-09-07T09:24:45.1005282Z #47 510.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:45.1008749Z #47 510.0 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:24:45.1009651Z #47 510.0 ^ 2025-09-07T09:24:45.1010121Z #47 510.0 2025-09-07T09:24:45.1010964Z #47 510.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:45.1011976Z #47 510.0 2025-09-07T09:24:46.3639177Z #47 511.3 [249/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T09:24:46.3658455Z #47 511.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:46.3661568Z #47 511.3 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:24:46.3662506Z #47 511.3 ^ 2025-09-07T09:24:46.3663011Z #47 511.3 2025-09-07T09:24:46.3663792Z #47 511.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:46.3664985Z #47 511.3 2025-09-07T09:24:46.3667782Z #47 511.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:46.3671081Z #47 511.3 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:24:46.3672005Z #47 511.3 ^ 2025-09-07T09:24:46.3672505Z #47 511.3 2025-09-07T09:24:46.3673272Z #47 511.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:46.3674230Z #47 511.3 2025-09-07T09:24:46.3676963Z #47 511.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:46.3680229Z #47 511.3 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:24:46.3681166Z #47 511.3 ^ 2025-09-07T09:24:46.3681652Z #47 511.3 2025-09-07T09:24:46.3682403Z #47 511.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:46.3683351Z #47 511.3 2025-09-07T09:24:46.3686090Z #47 511.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:46.3689125Z #47 511.3 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:24:46.3689981Z #47 511.3 ^ 2025-09-07T09:24:46.3690452Z #47 511.3 2025-09-07T09:24:46.3691185Z #47 511.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:46.3692491Z #47 511.3 2025-09-07T09:24:46.3694662Z #47 511.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:46.3697334Z #47 511.3 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:24:46.3698433Z #47 511.3 ^ 2025-09-07T09:24:46.3698851Z #47 511.3 2025-09-07T09:24:46.3699532Z #47 511.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:46.3700356Z #47 511.3 2025-09-07T09:24:46.6645910Z #47 511.6 [250/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T09:24:46.6665199Z #47 511.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:46.6668481Z #47 511.6 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:24:46.6669369Z #47 511.6 ^ 2025-09-07T09:24:46.6669852Z #47 511.6 2025-09-07T09:24:46.6670574Z #47 511.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:46.6671484Z #47 511.6 2025-09-07T09:24:46.6674055Z #47 511.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:46.6677221Z #47 511.6 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:24:46.6678097Z #47 511.6 ^ 2025-09-07T09:24:46.6678530Z #47 511.6 2025-09-07T09:24:46.6679346Z #47 511.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:46.6680248Z #47 511.6 2025-09-07T09:24:46.6683007Z #47 511.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:46.6686306Z #47 511.6 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:24:46.6687473Z #47 511.6 ^ 2025-09-07T09:24:46.6687992Z #47 511.6 2025-09-07T09:24:46.6688793Z #47 511.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:46.6689809Z #47 511.6 2025-09-07T09:24:46.6692870Z #47 511.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:46.6695466Z #47 511.6 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:24:46.6696198Z #47 511.6 ^ 2025-09-07T09:24:46.6696672Z #47 511.6 2025-09-07T09:24:46.6697276Z #47 511.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:46.6698006Z #47 511.6 2025-09-07T09:24:46.6700088Z #47 511.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:46.6702680Z #47 511.6 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:24:46.6703383Z #47 511.6 ^ 2025-09-07T09:24:46.6703780Z #47 511.6 2025-09-07T09:24:46.6704381Z #47 511.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:46.6705115Z #47 511.6 2025-09-07T09:24:47.3785479Z #47 512.3 [251/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T09:24:47.3805285Z #47 512.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:47.3808675Z #47 512.3 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:24:47.3809920Z #47 512.3 ^ 2025-09-07T09:24:47.3810439Z #47 512.3 2025-09-07T09:24:47.3811184Z #47 512.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:47.3812176Z #47 512.3 2025-09-07T09:24:47.3815114Z #47 512.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:47.3819731Z #47 512.3 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:24:47.3820718Z #47 512.3 ^ 2025-09-07T09:24:47.3821377Z #47 512.3 2025-09-07T09:24:47.3822158Z #47 512.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:47.3823172Z #47 512.3 2025-09-07T09:24:47.3825959Z #47 512.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:47.3829091Z #47 512.3 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:24:47.3830202Z #47 512.3 ^ 2025-09-07T09:24:47.3830687Z #47 512.3 2025-09-07T09:24:47.3831445Z #47 512.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:47.3832411Z #47 512.3 2025-09-07T09:24:47.3835416Z #47 512.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:47.3838709Z #47 512.3 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:24:47.3839609Z #47 512.3 ^ 2025-09-07T09:24:47.3840103Z #47 512.3 2025-09-07T09:24:47.3840827Z #47 512.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:47.3841776Z #47 512.3 2025-09-07T09:24:47.3844480Z #47 512.3 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:47.3847374Z #47 512.3 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:24:47.3848171Z #47 512.3 ^ 2025-09-07T09:24:47.3848619Z #47 512.3 2025-09-07T09:24:47.3849361Z #47 512.3 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:47.3850295Z #47 512.3 2025-09-07T09:24:48.7891755Z #47 513.7 [252/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:24:51.0423946Z #47 516.0 [253/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:24:53.6851864Z #47 518.6 [254/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:24:56.5907815Z #47 521.5 [255/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T09:24:56.5925961Z #47 521.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:56.5929412Z #47 521.5 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:24:56.5930418Z #47 521.5 ^ 2025-09-07T09:24:56.5930927Z #47 521.5 2025-09-07T09:24:56.5931687Z #47 521.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:56.5932801Z #47 521.5 2025-09-07T09:24:56.5935530Z #47 521.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:56.5938837Z #47 521.5 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:24:56.5939828Z #47 521.5 ^ 2025-09-07T09:24:56.5940586Z #47 521.5 2025-09-07T09:24:56.5941334Z #47 521.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:56.5942246Z #47 521.5 2025-09-07T09:24:56.5944825Z #47 521.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:56.5948267Z #47 521.5 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:24:56.5949257Z #47 521.5 ^ 2025-09-07T09:24:56.5949720Z #47 521.5 2025-09-07T09:24:56.5950490Z #47 521.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:56.5951533Z #47 521.5 2025-09-07T09:24:56.5954293Z #47 521.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:56.5957561Z #47 521.5 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:24:56.5958538Z #47 521.5 ^ 2025-09-07T09:24:56.5959144Z #47 521.5 2025-09-07T09:24:56.5959888Z #47 521.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:56.5960807Z #47 521.5 2025-09-07T09:24:56.5963486Z #47 521.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:24:56.5966962Z #47 521.5 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:24:56.5967971Z #47 521.5 ^ 2025-09-07T09:24:56.5968457Z #47 521.5 2025-09-07T09:24:56.5969204Z #47 521.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:24:56.5970126Z #47 521.5 2025-09-07T09:25:04.1822027Z #47 529.1 [256/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cuda.o 2025-09-07T09:25:04.1841267Z #47 529.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:04.1844584Z #47 529.1 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:25:04.1845658Z #47 529.1 ^ 2025-09-07T09:25:04.1846126Z #47 529.1 2025-09-07T09:25:04.1846963Z #47 529.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:04.1847884Z #47 529.1 2025-09-07T09:25:04.1850494Z #47 529.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:04.1853768Z #47 529.1 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:25:04.1854625Z #47 529.1 ^ 2025-09-07T09:25:04.1855229Z #47 529.1 2025-09-07T09:25:04.1855998Z #47 529.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:04.1856978Z #47 529.1 2025-09-07T09:25:04.1859507Z #47 529.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:04.1862811Z #47 529.1 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:25:04.1863749Z #47 529.1 ^ 2025-09-07T09:25:04.1864254Z #47 529.1 2025-09-07T09:25:04.1865058Z #47 529.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:04.1866022Z #47 529.1 2025-09-07T09:25:04.1868814Z #47 529.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:04.1871955Z #47 529.1 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:25:04.1872736Z #47 529.1 ^ 2025-09-07T09:25:04.1873131Z #47 529.1 2025-09-07T09:25:04.1873775Z #47 529.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:04.1874612Z #47 529.1 2025-09-07T09:25:04.1877056Z #47 529.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:04.1880145Z #47 529.1 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:25:04.1881016Z #47 529.1 ^ 2025-09-07T09:25:04.1881379Z #47 529.1 2025-09-07T09:25:04.1881957Z #47 529.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:04.1882735Z #47 529.1 2025-09-07T09:25:04.6450656Z #47 529.6 [257/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cuda.o 2025-09-07T09:25:04.6473799Z #47 529.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:04.6477964Z #47 529.6 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:25:04.6479208Z #47 529.6 ^ 2025-09-07T09:25:04.6480214Z #47 529.6 2025-09-07T09:25:04.6481082Z #47 529.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:04.6482057Z #47 529.6 2025-09-07T09:25:04.6484912Z #47 529.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:04.6488324Z #47 529.6 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:25:04.6489280Z #47 529.6 ^ 2025-09-07T09:25:04.6489806Z #47 529.6 2025-09-07T09:25:04.6490587Z #47 529.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:04.6491596Z #47 529.6 2025-09-07T09:25:04.6494846Z #47 529.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:04.6498135Z #47 529.6 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:25:04.6499080Z #47 529.6 ^ 2025-09-07T09:25:04.6499553Z #47 529.6 2025-09-07T09:25:04.6500318Z #47 529.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:04.6501254Z #47 529.6 2025-09-07T09:25:04.6503945Z #47 529.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:04.6507397Z #47 529.6 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:25:04.6508260Z #47 529.6 ^ 2025-09-07T09:25:04.6508753Z #47 529.6 2025-09-07T09:25:04.6509496Z #47 529.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:04.6510441Z #47 529.6 2025-09-07T09:25:04.6513066Z #47 529.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:04.6516303Z #47 529.6 constexpr auto use_custom_mask = MaskMode::kNone == MaskMode::kCustom; 2025-09-07T09:25:04.6517350Z #47 529.6 ^ 2025-09-07T09:25:04.6517810Z #47 529.6 2025-09-07T09:25:04.6518572Z #47 529.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:04.6519523Z #47 529.6 2025-09-07T09:25:08.7774117Z #47 533.7 [258/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:25:09.0042063Z #47 533.9 [259/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T09:25:09.0063746Z #47 533.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:09.0067574Z #47 533.9 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:25:09.0068906Z #47 533.9 ^ 2025-09-07T09:25:09.0069466Z #47 533.9 2025-09-07T09:25:09.0070294Z #47 533.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:09.0071320Z #47 533.9 2025-09-07T09:25:09.0088871Z #47 533.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:09.0093251Z #47 533.9 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:25:09.0094326Z #47 533.9 ^ 2025-09-07T09:25:09.0094834Z #47 533.9 2025-09-07T09:25:09.0095674Z #47 533.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:09.0096708Z #47 533.9 2025-09-07T09:25:09.0099659Z #47 533.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:09.0103216Z #47 533.9 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:25:09.0104302Z #47 533.9 ^ 2025-09-07T09:25:09.0104829Z #47 533.9 2025-09-07T09:25:09.0105650Z #47 533.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:09.0106680Z #47 533.9 2025-09-07T09:25:09.0109739Z #47 533.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:09.0113353Z #47 533.9 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:25:09.0114372Z #47 533.9 ^ 2025-09-07T09:25:09.0114861Z #47 533.9 2025-09-07T09:25:09.0115643Z #47 533.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:09.0116655Z #47 533.9 2025-09-07T09:25:09.0119850Z #47 533.9 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:09.0123626Z #47 533.9 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:25:09.0124649Z #47 533.9 ^ 2025-09-07T09:25:09.0125149Z #47 533.9 2025-09-07T09:25:09.0125949Z #47 533.9 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:09.0126934Z #47 533.9 2025-09-07T09:25:09.4539894Z #47 534.4 [260/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cuda.o 2025-09-07T09:25:09.4560468Z #47 534.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:09.4565596Z #47 534.4 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:25:09.4566554Z #47 534.4 ^ 2025-09-07T09:25:09.4567185Z #47 534.4 2025-09-07T09:25:09.4567932Z #47 534.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:09.4568885Z #47 534.4 2025-09-07T09:25:09.4571425Z #47 534.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:09.4574832Z #47 534.4 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:25:09.4575722Z #47 534.4 ^ 2025-09-07T09:25:09.4576181Z #47 534.4 2025-09-07T09:25:09.4576941Z #47 534.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:09.4577816Z #47 534.4 2025-09-07T09:25:09.4580513Z #47 534.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:09.4584103Z #47 534.4 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:25:09.4585060Z #47 534.4 ^ 2025-09-07T09:25:09.4585550Z #47 534.4 2025-09-07T09:25:09.4586306Z #47 534.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:09.4587251Z #47 534.4 2025-09-07T09:25:09.4590054Z #47 534.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:09.4593742Z #47 534.4 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:25:09.4594668Z #47 534.4 ^ 2025-09-07T09:25:09.4595181Z #47 534.4 2025-09-07T09:25:09.4595950Z #47 534.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:09.4596892Z #47 534.4 2025-09-07T09:25:09.4599841Z #47 534.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:09.4603037Z #47 534.4 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:25:09.4603867Z #47 534.4 ^ 2025-09-07T09:25:09.4604323Z #47 534.4 2025-09-07T09:25:09.4605018Z #47 534.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:09.4605904Z #47 534.4 2025-09-07T09:25:11.2640370Z #47 536.2 [261/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cuda.o 2025-09-07T09:25:11.2656129Z #47 536.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:11.2658813Z #47 536.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:25:11.2659484Z #47 536.2 ^ 2025-09-07T09:25:11.2659873Z #47 536.2 2025-09-07T09:25:11.2660434Z #47 536.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:11.2661141Z #47 536.2 2025-09-07T09:25:11.2663413Z #47 536.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:11.2665870Z #47 536.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:25:11.2666596Z #47 536.2 ^ 2025-09-07T09:25:11.2666981Z #47 536.2 2025-09-07T09:25:11.2667556Z #47 536.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:11.2668281Z #47 536.2 2025-09-07T09:25:11.2670418Z #47 536.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:11.2672786Z #47 536.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:25:11.2673478Z #47 536.2 ^ 2025-09-07T09:25:11.2673873Z #47 536.2 2025-09-07T09:25:11.2674447Z #47 536.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:11.2675173Z #47 536.2 2025-09-07T09:25:11.2677255Z #47 536.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:11.2679573Z #47 536.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:25:11.2680242Z #47 536.2 ^ 2025-09-07T09:25:11.2680597Z #47 536.2 2025-09-07T09:25:11.2681167Z #47 536.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:11.2681845Z #47 536.2 2025-09-07T09:25:11.2683818Z #47 536.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:11.2686240Z #47 536.2 constexpr auto use_custom_mask = MaskMode::kCausal == MaskMode::kCustom; 2025-09-07T09:25:11.2686921Z #47 536.2 ^ 2025-09-07T09:25:11.2687301Z #47 536.2 2025-09-07T09:25:11.2687870Z #47 536.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:11.2688581Z #47 536.2 2025-09-07T09:25:11.4543453Z #47 536.2 [262/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cuda.o 2025-09-07T09:25:11.4562094Z #47 536.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:11.4565032Z #47 536.2 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:25:11.4565886Z #47 536.2 ^ 2025-09-07T09:25:11.4566349Z #47 536.2 2025-09-07T09:25:11.4567095Z #47 536.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:11.4567975Z #47 536.2 2025-09-07T09:25:11.4570531Z #47 536.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:11.4573686Z #47 536.2 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:25:11.4574517Z #47 536.2 ^ 2025-09-07T09:25:11.4574971Z #47 536.2 2025-09-07T09:25:11.4575679Z #47 536.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:11.4576524Z #47 536.2 2025-09-07T09:25:11.4579058Z #47 536.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:11.4582167Z #47 536.2 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:25:11.4583038Z #47 536.2 ^ 2025-09-07T09:25:11.4583515Z #47 536.2 2025-09-07T09:25:11.4584229Z #47 536.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:11.4585081Z #47 536.2 2025-09-07T09:25:11.4587681Z #47 536.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:11.4590786Z #47 536.2 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:25:11.4591664Z #47 536.2 ^ 2025-09-07T09:25:11.4592431Z #47 536.2 2025-09-07T09:25:11.4593161Z #47 536.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:11.4594379Z #47 536.2 2025-09-07T09:25:11.4596836Z #47 536.2 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:11.4599731Z #47 536.2 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:25:11.4600600Z #47 536.2 ^ 2025-09-07T09:25:11.4601028Z #47 536.2 2025-09-07T09:25:11.4601904Z #47 536.2 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:11.4602784Z #47 536.2 2025-09-07T09:25:12.0391348Z #47 537.0 [263/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cuda.o 2025-09-07T09:25:12.0409775Z #47 537.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:12.0413172Z #47 537.0 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:25:12.0414170Z #47 537.0 ^ 2025-09-07T09:25:12.0414645Z #47 537.0 2025-09-07T09:25:12.0415355Z #47 537.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:12.0416296Z #47 537.0 2025-09-07T09:25:12.0418948Z #47 537.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:12.0422072Z #47 537.0 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:25:12.0422926Z #47 537.0 ^ 2025-09-07T09:25:12.0423384Z #47 537.0 2025-09-07T09:25:12.0424076Z #47 537.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:12.0425273Z #47 537.0 2025-09-07T09:25:12.0427780Z #47 537.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:12.0431015Z #47 537.0 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:25:12.0431958Z #47 537.0 ^ 2025-09-07T09:25:12.0432423Z #47 537.0 2025-09-07T09:25:12.0433334Z #47 537.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:12.0434198Z #47 537.0 2025-09-07T09:25:12.0436749Z #47 537.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:12.0440028Z #47 537.0 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:25:12.0440967Z #47 537.0 ^ 2025-09-07T09:25:12.0441436Z #47 537.0 2025-09-07T09:25:12.0442153Z #47 537.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:12.0443168Z #47 537.0 2025-09-07T09:25:12.0445679Z #47 537.0 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:12.0448849Z #47 537.0 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:25:12.0449732Z #47 537.0 ^ 2025-09-07T09:25:12.0450141Z #47 537.0 2025-09-07T09:25:12.0450790Z #47 537.0 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:12.0451667Z #47 537.0 2025-09-07T09:25:14.5070156Z #47 539.4 [264/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cuda.o 2025-09-07T09:25:14.5087082Z #47 539.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cu(100): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T09:25:14.5089830Z #47 539.4 bool use_swa = window_left != -1; 2025-09-07T09:25:14.5090460Z #47 539.4 ^ 2025-09-07T09:25:14.5090850Z #47 539.4 2025-09-07T09:25:14.5091576Z #47 539.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:14.5092839Z #47 539.4 2025-09-07T09:25:14.5095172Z #47 539.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cu(197): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T09:25:14.5097865Z #47 539.4 bool use_swa = window_left != -1; 2025-09-07T09:25:14.5098451Z #47 539.4 ^ 2025-09-07T09:25:14.5098871Z #47 539.4 2025-09-07T09:25:14.7224282Z #47 539.5 [265/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cuda.o 2025-09-07T09:25:14.7241722Z #47 539.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:14.7244657Z #47 539.5 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:25:14.7245411Z #47 539.5 ^ 2025-09-07T09:25:14.7245792Z #47 539.5 2025-09-07T09:25:14.7246481Z #47 539.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:14.7247298Z #47 539.5 2025-09-07T09:25:14.7249791Z #47 539.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:14.7252847Z #47 539.5 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:25:14.7254032Z #47 539.5 ^ 2025-09-07T09:25:14.7254471Z #47 539.5 2025-09-07T09:25:14.7255155Z #47 539.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:14.7255983Z #47 539.5 2025-09-07T09:25:14.7258625Z #47 539.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:14.7261516Z #47 539.5 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:25:14.7262506Z #47 539.5 ^ 2025-09-07T09:25:14.7262956Z #47 539.5 2025-09-07T09:25:14.7263646Z #47 539.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:14.7264530Z #47 539.5 2025-09-07T09:25:14.7266959Z #47 539.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:14.7270013Z #47 539.5 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:25:14.7270867Z #47 539.5 ^ 2025-09-07T09:25:14.7271312Z #47 539.5 2025-09-07T09:25:14.7272013Z #47 539.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:14.7272847Z #47 539.5 2025-09-07T09:25:14.7275319Z #47 539.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:14.7278245Z #47 539.5 constexpr auto use_custom_mask = MaskMode::kCustom == MaskMode::kCustom; 2025-09-07T09:25:14.7279075Z #47 539.5 ^ 2025-09-07T09:25:14.7279529Z #47 539.5 2025-09-07T09:25:14.7280220Z #47 539.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:14.7281085Z #47 539.5 2025-09-07T09:25:15.2634592Z #47 540.2 [266/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T09:25:17.5206250Z #47 542.4 [267/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_jit_pybind.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_jit_pybind.cuda.o 2025-09-07T09:25:18.1676112Z #47 543.1 [268/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cuda.o 2025-09-07T09:25:18.1701129Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.1733823Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.1763372Z #47 543.1 ^ 2025-09-07T09:25:18.1764134Z #47 543.1 2025-09-07T09:25:18.1765139Z #47 543.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:18.1766418Z #47 543.1 2025-09-07T09:25:18.1769683Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.1798246Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.1818277Z #47 543.1 ^ 2025-09-07T09:25:18.1818788Z #47 543.1 2025-09-07T09:25:18.1821117Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.1842112Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.1862053Z #47 543.1 ^ 2025-09-07T09:25:18.1862553Z #47 543.1 2025-09-07T09:25:18.1864959Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.1886416Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.1905377Z #47 543.1 ^ 2025-09-07T09:25:18.1905875Z #47 543.1 2025-09-07T09:25:18.1908442Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.1930794Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.1952070Z #47 543.1 ^ 2025-09-07T09:25:18.1952628Z #47 543.1 2025-09-07T09:25:18.1955063Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.1977825Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2002863Z #47 543.1 ^ 2025-09-07T09:25:18.2003543Z #47 543.1 2025-09-07T09:25:18.2006915Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2039008Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2066972Z #47 543.1 ^ 2025-09-07T09:25:18.2067404Z #47 543.1 2025-09-07T09:25:18.2069551Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2088455Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2108303Z #47 543.1 ^ 2025-09-07T09:25:18.2108764Z #47 543.1 2025-09-07T09:25:18.2110767Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2128935Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2145729Z #47 543.1 ^ 2025-09-07T09:25:18.2146153Z #47 543.1 2025-09-07T09:25:18.2146762Z #47 543.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:18.2147518Z #47 543.1 2025-09-07T09:25:18.2149962Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2181583Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2204063Z #47 543.1 ^ 2025-09-07T09:25:18.2204556Z #47 543.1 2025-09-07T09:25:18.2206915Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2228578Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2248604Z #47 543.1 ^ 2025-09-07T09:25:18.2249077Z #47 543.1 2025-09-07T09:25:18.2251557Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2272521Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2292607Z #47 543.1 ^ 2025-09-07T09:25:18.2293077Z #47 543.1 2025-09-07T09:25:18.2295609Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2317237Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2336778Z #47 543.1 ^ 2025-09-07T09:25:18.2337244Z #47 543.1 2025-09-07T09:25:18.2339474Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2362633Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2392745Z #47 543.1 ^ 2025-09-07T09:25:18.2393449Z #47 543.1 2025-09-07T09:25:18.2396665Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2424699Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2454343Z #47 543.1 ^ 2025-09-07T09:25:18.2455012Z #47 543.1 2025-09-07T09:25:18.2458302Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2490408Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2511337Z #47 543.1 ^ 2025-09-07T09:25:18.2511785Z #47 543.1 2025-09-07T09:25:18.2514286Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2534858Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2553855Z #47 543.1 ^ 2025-09-07T09:25:18.2554331Z #47 543.1 2025-09-07T09:25:18.2555071Z #47 543.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:18.2555984Z #47 543.1 2025-09-07T09:25:18.2558127Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2579572Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2598981Z #47 543.1 ^ 2025-09-07T09:25:18.2599460Z #47 543.1 2025-09-07T09:25:18.2603558Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2624775Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2644438Z #47 543.1 ^ 2025-09-07T09:25:18.2644923Z #47 543.1 2025-09-07T09:25:18.2647143Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2667702Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2687135Z #47 543.1 ^ 2025-09-07T09:25:18.2687609Z #47 543.1 2025-09-07T09:25:18.2689953Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2710396Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2728189Z #47 543.1 ^ 2025-09-07T09:25:18.2728627Z #47 543.1 2025-09-07T09:25:18.2730575Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2749975Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2767208Z #47 543.1 ^ 2025-09-07T09:25:18.2767651Z #47 543.1 2025-09-07T09:25:18.2769592Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2788429Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2806418Z #47 543.1 ^ 2025-09-07T09:25:18.2806859Z #47 543.1 2025-09-07T09:25:18.2808794Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2827871Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2844856Z #47 543.1 ^ 2025-09-07T09:25:18.2845311Z #47 543.1 2025-09-07T09:25:18.2847465Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2865876Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2882755Z #47 543.1 ^ 2025-09-07T09:25:18.2883211Z #47 543.1 2025-09-07T09:25:18.2883975Z #47 543.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:18.2884752Z #47 543.1 2025-09-07T09:25:18.2886617Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2907068Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2926858Z #47 543.1 ^ 2025-09-07T09:25:18.2927351Z #47 543.1 2025-09-07T09:25:18.2929821Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2951289Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.2971971Z #47 543.1 ^ 2025-09-07T09:25:18.2972630Z #47 543.1 2025-09-07T09:25:18.2974914Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.2997322Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.3017579Z #47 543.1 ^ 2025-09-07T09:25:18.3018162Z #47 543.1 2025-09-07T09:25:18.3020230Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.3043432Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.3064140Z #47 543.1 ^ 2025-09-07T09:25:18.3064649Z #47 543.1 2025-09-07T09:25:18.3066911Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.3089485Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.3107724Z #47 543.1 ^ 2025-09-07T09:25:18.3108175Z #47 543.1 2025-09-07T09:25:18.3110215Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.3128757Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.3146480Z #47 543.1 ^ 2025-09-07T09:25:18.3146934Z #47 543.1 2025-09-07T09:25:18.3148820Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.3167856Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.3185553Z #47 543.1 ^ 2025-09-07T09:25:18.3186043Z #47 543.1 2025-09-07T09:25:18.3188186Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.3206242Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.3223337Z #47 543.1 ^ 2025-09-07T09:25:18.3223781Z #47 543.1 2025-09-07T09:25:18.3224494Z #47 543.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:18.3225275Z #47 543.1 2025-09-07T09:25:18.3227440Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.3245488Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.3262393Z #47 543.1 ^ 2025-09-07T09:25:18.3262809Z #47 543.1 2025-09-07T09:25:18.3264769Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.3283168Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.3301029Z #47 543.1 ^ 2025-09-07T09:25:18.3301526Z #47 543.1 2025-09-07T09:25:18.3303823Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.3325251Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.3345178Z #47 543.1 ^ 2025-09-07T09:25:18.3345682Z #47 543.1 2025-09-07T09:25:18.3347966Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.3369673Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.3390012Z #47 543.1 ^ 2025-09-07T09:25:18.3390537Z #47 543.1 2025-09-07T09:25:18.3393507Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.3415562Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.3435948Z #47 543.1 ^ 2025-09-07T09:25:18.3436430Z #47 543.1 2025-09-07T09:25:18.3438774Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.3461420Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.3482516Z #47 543.1 ^ 2025-09-07T09:25:18.3483012Z #47 543.1 2025-09-07T09:25:18.3485309Z #47 543.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:18.3505771Z #47 543.1 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:18.3523103Z #47 543.1 ^ 2025-09-07T09:25:18.3523547Z #47 543.1 2025-09-07T09:25:22.5944438Z #47 547.5 [269/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T09:25:24.4087762Z #47 549.3 [270/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:25:25.6237523Z #47 550.5 [271/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cuda.o 2025-09-07T09:25:25.6255055Z #47 550.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cu(100): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T09:25:25.6257601Z #47 550.5 bool use_swa = window_left != -1; 2025-09-07T09:25:25.6258234Z #47 550.5 ^ 2025-09-07T09:25:25.6258669Z #47 550.5 2025-09-07T09:25:25.6259369Z #47 550.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:25.6260268Z #47 550.5 2025-09-07T09:25:25.6262407Z #47 550.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cu(197): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T09:25:25.6265236Z #47 550.5 bool use_swa = window_left != -1; 2025-09-07T09:25:25.6265829Z #47 550.5 ^ 2025-09-07T09:25:25.6266201Z #47 550.5 2025-09-07T09:25:25.8247385Z #47 550.6 [272/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cuda.o 2025-09-07T09:25:25.8266509Z #47 550.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:25.8269592Z #47 550.6 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:25:25.8270570Z #47 550.6 ^ 2025-09-07T09:25:25.8271035Z #47 550.6 2025-09-07T09:25:25.8271729Z #47 550.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:25.8272620Z #47 550.6 2025-09-07T09:25:25.8275101Z #47 550.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:25.8278162Z #47 550.6 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:25:25.8279100Z #47 550.6 ^ 2025-09-07T09:25:25.8279554Z #47 550.6 2025-09-07T09:25:25.8280256Z #47 550.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:25.8281101Z #47 550.6 2025-09-07T09:25:25.8283609Z #47 550.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:25.8287030Z #47 550.6 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:25:25.8287936Z #47 550.6 ^ 2025-09-07T09:25:25.8288380Z #47 550.6 2025-09-07T09:25:25.8289074Z #47 550.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:25.8289955Z #47 550.6 2025-09-07T09:25:25.8293019Z #47 550.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:25.8296077Z #47 550.6 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:25:25.8297258Z #47 550.6 ^ 2025-09-07T09:25:25.8297695Z #47 550.6 2025-09-07T09:25:25.8298399Z #47 550.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:25.8299290Z #47 550.6 2025-09-07T09:25:25.8301823Z #47 550.6 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cu(6): warning #177-D: variable "flashinfer::use_custom_mask" was declared but never referenced 2025-09-07T09:25:25.8305073Z #47 550.6 constexpr auto use_custom_mask = MaskMode::kMultiItemScoring == MaskMode::kCustom; 2025-09-07T09:25:25.8306020Z #47 550.6 ^ 2025-09-07T09:25:25.8306483Z #47 550.6 2025-09-07T09:25:25.8307186Z #47 550.6 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:25.8308079Z #47 550.6 2025-09-07T09:25:26.1531632Z #47 551.1 [273/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:25:26.2880160Z #47 551.2 [274/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:25:28.8043214Z #47 553.7 [275/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_jit_pybind.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_jit_pybind.cuda.o 2025-09-07T09:25:30.0302962Z #47 554.9 [276/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:25:31.4714457Z #47 556.4 [277/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:25:32.5692681Z #47 557.5 [278/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cuda.o 2025-09-07T09:25:32.5712107Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.5733193Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.5752863Z #47 557.5 ^ 2025-09-07T09:25:32.5753373Z #47 557.5 2025-09-07T09:25:32.5754091Z #47 557.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:32.5755007Z #47 557.5 2025-09-07T09:25:32.5757376Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.5778733Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.5799054Z #47 557.5 ^ 2025-09-07T09:25:32.5799578Z #47 557.5 2025-09-07T09:25:32.5801922Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.5823311Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.5843242Z #47 557.5 ^ 2025-09-07T09:25:32.5843745Z #47 557.5 2025-09-07T09:25:32.5846010Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.5867205Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.5887262Z #47 557.5 ^ 2025-09-07T09:25:32.5887750Z #47 557.5 2025-09-07T09:25:32.5890176Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.5913413Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.5934384Z #47 557.5 ^ 2025-09-07T09:25:32.5934933Z #47 557.5 2025-09-07T09:25:32.5937566Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.5959934Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.5981021Z #47 557.5 ^ 2025-09-07T09:25:32.5981541Z #47 557.5 2025-09-07T09:25:32.5984136Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6014761Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6036675Z #47 557.5 ^ 2025-09-07T09:25:32.6037166Z #47 557.5 2025-09-07T09:25:32.6039599Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6062353Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6084134Z #47 557.5 ^ 2025-09-07T09:25:32.6084645Z #47 557.5 2025-09-07T09:25:32.6087196Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6108475Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6127537Z #47 557.5 ^ 2025-09-07T09:25:32.6128006Z #47 557.5 2025-09-07T09:25:32.6128965Z #47 557.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:32.6129801Z #47 557.5 2025-09-07T09:25:32.6132167Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6152606Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6171642Z #47 557.5 ^ 2025-09-07T09:25:32.6172124Z #47 557.5 2025-09-07T09:25:32.6174385Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6195491Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6216969Z #47 557.5 ^ 2025-09-07T09:25:32.6217496Z #47 557.5 2025-09-07T09:25:32.6219984Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6242338Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6264463Z #47 557.5 ^ 2025-09-07T09:25:32.6265008Z #47 557.5 2025-09-07T09:25:32.6267548Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6290710Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6312732Z #47 557.5 ^ 2025-09-07T09:25:32.6313284Z #47 557.5 2025-09-07T09:25:32.6315884Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6340269Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6363177Z #47 557.5 ^ 2025-09-07T09:25:32.6363688Z #47 557.5 2025-09-07T09:25:32.6366330Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6390218Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6410376Z #47 557.5 ^ 2025-09-07T09:25:32.6410854Z #47 557.5 2025-09-07T09:25:32.6413148Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6434358Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6454600Z #47 557.5 ^ 2025-09-07T09:25:32.6455084Z #47 557.5 2025-09-07T09:25:32.6457230Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6478060Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6498495Z #47 557.5 ^ 2025-09-07T09:25:32.6499031Z #47 557.5 2025-09-07T09:25:32.6499749Z #47 557.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:32.6500651Z #47 557.5 2025-09-07T09:25:32.6503007Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6524214Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6543237Z #47 557.5 ^ 2025-09-07T09:25:32.6543708Z #47 557.5 2025-09-07T09:25:32.6545754Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6566498Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6587324Z #47 557.5 ^ 2025-09-07T09:25:32.6587792Z #47 557.5 2025-09-07T09:25:32.6589963Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6613410Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6635295Z #47 557.5 ^ 2025-09-07T09:25:32.6635837Z #47 557.5 2025-09-07T09:25:32.6638461Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6664465Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6688595Z #47 557.5 ^ 2025-09-07T09:25:32.6689142Z #47 557.5 2025-09-07T09:25:32.6691787Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6716339Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6738289Z #47 557.5 ^ 2025-09-07T09:25:32.6738821Z #47 557.5 2025-09-07T09:25:32.6741304Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6764876Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6786942Z #47 557.5 ^ 2025-09-07T09:25:32.6787458Z #47 557.5 2025-09-07T09:25:32.6790127Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6814613Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6836705Z #47 557.5 ^ 2025-09-07T09:25:32.6837240Z #47 557.5 2025-09-07T09:25:32.6839408Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6863250Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6884648Z #47 557.5 ^ 2025-09-07T09:25:32.6885066Z #47 557.5 2025-09-07T09:25:32.6885652Z #47 557.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:32.6886531Z #47 557.5 2025-09-07T09:25:32.6888588Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6912107Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6934690Z #47 557.5 ^ 2025-09-07T09:25:32.6935219Z #47 557.5 2025-09-07T09:25:32.6937665Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.6961040Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.6982548Z #47 557.5 ^ 2025-09-07T09:25:32.6983038Z #47 557.5 2025-09-07T09:25:32.6985645Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.7042501Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.7064374Z #47 557.5 ^ 2025-09-07T09:25:32.7064947Z #47 557.5 2025-09-07T09:25:32.7067375Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.7090695Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.7113272Z #47 557.5 ^ 2025-09-07T09:25:32.7113791Z #47 557.5 2025-09-07T09:25:32.7116200Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.7139522Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.7162294Z #47 557.5 ^ 2025-09-07T09:25:32.7162808Z #47 557.5 2025-09-07T09:25:32.7165438Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.7187452Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.7209088Z #47 557.5 ^ 2025-09-07T09:25:32.7209658Z #47 557.5 2025-09-07T09:25:32.7212136Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.7236925Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.7258306Z #47 557.5 ^ 2025-09-07T09:25:32.7258872Z #47 557.5 2025-09-07T09:25:32.7261553Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.7284495Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.7306636Z #47 557.5 ^ 2025-09-07T09:25:32.7307162Z #47 557.5 2025-09-07T09:25:32.7307866Z #47 557.5 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:25:32.7308753Z #47 557.5 2025-09-07T09:25:32.7310965Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.7333771Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.7354728Z #47 557.5 ^ 2025-09-07T09:25:32.7355436Z #47 557.5 2025-09-07T09:25:32.7357846Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.7381328Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.7403349Z #47 557.5 ^ 2025-09-07T09:25:32.7403897Z #47 557.5 2025-09-07T09:25:32.7406304Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(118): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.7428916Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { RaggedParams params; params.q = static_cast(q.data_ptr()); params.k = static_cast(k.data_ptr()); params.v = static_cast(v.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.kv_indptr = static_cast(kv_indptr.data_ptr()); params.num_qo_heads = num_qo_heads; params.num_kv_heads = num_kv_heads; params.group_size = uint_fastdiv(num_qo_heads / num_kv_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.k_stride_n = k_stride_n; params.k_stride_h = k_stride_h; params.v_stride_n = v_stride_n; params.v_stride_h = v_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.7451121Z #47 557.5 ^ 2025-09-07T09:25:32.7451625Z #47 557.5 2025-09-07T09:25:32.7453821Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.7474872Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.7494696Z #47 557.5 ^ 2025-09-07T09:25:32.7495169Z #47 557.5 2025-09-07T09:25:32.7497291Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.7520267Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.7542545Z #47 557.5 ^ 2025-09-07T09:25:32.7543000Z #47 557.5 2025-09-07T09:25:32.7545360Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.7565803Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.7585193Z #47 557.5 ^ 2025-09-07T09:25:32.7585839Z #47 557.5 2025-09-07T09:25:32.7588035Z #47 557.5 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cu(249): warning #177-D: variable "use_custom_mask" was declared but never referenced 2025-09-07T09:25:32.7610054Z #47 557.5 { constexpr auto use_custom_mask = MASK_MODE == MaskMode::kCustom; using AttentionVariant = AttentionSink; [&] { PagedParams params; params.q = static_cast(q.data_ptr()); paged_kv_t paged_kv( num_kv_heads, page_size, HEAD_DIM_VO, batch_size, kv_layout, static_cast(paged_k_cache.data_ptr()), static_cast(paged_v_cache.data_ptr()), kv_cache_strides, static_cast(paged_kv_indices.data_ptr()), static_cast(paged_kv_indptr.data_ptr()), static_cast(paged_kv_last_page_len.data_ptr())); params.paged_kv = paged_kv; params.q_indptr = static_cast(qo_indptr.data_ptr()); params.o = static_cast(o.data_ptr()); params.lse = maybe_lse ? static_cast(maybe_lse->data_ptr()) : nullptr; params.num_qo_heads = num_qo_heads; params.group_size = uint_fastdiv(num_qo_heads / paged_kv.num_heads); params.q_stride_n = q_stride_n; params.q_stride_h = q_stride_h; params.window_left = window_left; params.request_indices = nullptr; params.qo_tile_indices = nullptr; params.kv_tile_indices = nullptr; params.merge_indptr = nullptr; params.o_indptr = nullptr; params.kv_chunk_size_ptr = nullptr; params.block_valid_mask = nullptr; params.total_num_rows = nullptr; params.max_total_num_rows = 0; params.padded_batch_size = 0; params.partition_kv = false; params.sink = static_cast(sink.data_ptr()); params.sm_scale = sm_scale; DTypeO* tmp_v = nullptr; float* tmp_s = nullptr; params.request_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.request_indices_offset); params.qo_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.qo_tile_indices_offset); params.kv_tile_indices = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_tile_indices_offset); params.o_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.o_indptr_offset); params.kv_chunk_size_ptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.kv_chunk_size_ptr_offset); if (plan_info.split_kv) { params.merge_indptr = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.merge_indptr_offset); tmp_v = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.v_offset); tmp_s = GetPtrFromBaseOffset(float_buffer_ptr, plan_info.s_offset); if (plan_info.enable_cuda_graph) { params.block_valid_mask = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.block_valid_mask_offset); } } params.padded_batch_size = plan_info.padded_batch_size; params.max_total_num_rows = plan_info.total_num_rows; if (plan_info.enable_cuda_graph) { params.total_num_rows = GetPtrFromBaseOffset(int_buffer_ptr, plan_info.total_num_rows_offset); } cudaError_t status = cudaSuccess; 2025-09-07T09:25:32.7630177Z #47 557.5 ^ 2025-09-07T09:25:32.7630644Z #47 557.5 2025-09-07T09:25:34.9881980Z #47 559.9 [279/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:25:35.3145603Z #47 560.2 [280/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:25:38.3806077Z #47 563.3 [281/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:25:55.3548580Z #47 580.3 [282/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:25:59.6645086Z #47 584.6 [283/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:26:00.6903956Z #47 585.6 [284/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:26:02.5819503Z #47 587.5 [285/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output fmha_cutlass_sm100a/fmha_cutlass_sm100_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=fmha_cutlass_sm100a -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_100a,code=sm_100a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/csrc/fmha_cutlass_sm100_pybind.cu -o fmha_cutlass_sm100a/fmha_cutlass_sm100_pybind.cuda.o 2025-09-07T09:26:03.2856533Z #47 588.2 [286/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output fmha_cutlass_sm100a/blackwell_fmha_plan.cuda.o.d -DTORCH_EXTENSION_NAME=fmha_cutlass_sm100a -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_100a,code=sm_100a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/csrc/blackwell_fmha_plan.cu -o fmha_cutlass_sm100a/blackwell_fmha_plan.cuda.o 2025-09-07T09:26:03.5364676Z #47 588.3 [287/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cu -o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cuda.o 2025-09-07T09:26:03.5534873Z #47 588.3 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cu:16: 2025-09-07T09:26:03.5540026Z #47 588.3 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:03.5544296Z #47 588.3 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:03.5545771Z #47 588.3 | 2025-09-07T09:26:03.5547827Z #47 588.3 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cu:16: 2025-09-07T09:26:03.5551412Z #47 588.3 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:03.5554587Z #47 588.3 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:03.5556272Z #47 588.3 | 2025-09-07T09:26:03.5863745Z #47 588.5 [288/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:26:04.9775759Z #47 589.9 [289/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:26:06.5206704Z #47 591.4 [290/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cu -o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cuda.o 2025-09-07T09:26:06.5223806Z #47 591.4 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cu:19: 2025-09-07T09:26:06.5227745Z #47 591.4 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:06.5230891Z #47 591.4 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:06.5232260Z #47 591.4 | 2025-09-07T09:26:06.5234549Z #47 591.4 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cu:19: 2025-09-07T09:26:06.5238387Z #47 591.4 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:06.5241454Z #47 591.4 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:06.5242897Z #47 591.4 | 2025-09-07T09:26:06.9792240Z #47 591.9 [291/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu -o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cuda.o 2025-09-07T09:26:06.9810289Z #47 591.9 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T09:26:06.9813967Z #47 591.9 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:06.9816952Z #47 591.9 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:06.9818206Z #47 591.9 | 2025-09-07T09:26:06.9820180Z #47 591.9 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T09:26:06.9823457Z #47 591.9 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:06.9826056Z #47 591.9 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:06.9827372Z #47 591.9 | 2025-09-07T09:26:06.9829503Z #47 591.9 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T09:26:06.9832748Z #47 591.9 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:06.9835822Z #47 591.9 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:06.9837199Z #47 591.9 | 2025-09-07T09:26:06.9839407Z #47 591.9 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T09:26:06.9843099Z #47 591.9 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:06.9846142Z #47 591.9 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:06.9847758Z #47 591.9 | 2025-09-07T09:26:06.9849652Z #47 591.9 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T09:26:06.9853155Z #47 591.9 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:06.9856363Z #47 591.9 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:06.9857725Z #47 591.9 | 2025-09-07T09:26:06.9860040Z #47 591.9 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T09:26:06.9864148Z #47 591.9 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:06.9867398Z #47 591.9 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:06.9868896Z #47 591.9 | 2025-09-07T09:26:07.1020912Z #47 592.0 [292/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T09:26:07.3116842Z #47 592.1 [293/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:26:08.4344607Z #47 593.3 [294/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:26:08.8979694Z #47 593.8 [295/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu -o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cuda.o 2025-09-07T09:26:08.8999100Z #47 593.8 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T09:26:08.9002914Z #47 593.8 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:08.9006128Z #47 593.8 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:08.9007564Z #47 593.8 | 2025-09-07T09:26:08.9009804Z #47 593.8 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T09:26:08.9013790Z #47 593.8 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:08.9016621Z #47 593.8 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:08.9018038Z #47 593.8 | 2025-09-07T09:26:08.9020127Z #47 593.8 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T09:26:08.9023902Z #47 593.8 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:08.9026717Z #47 593.8 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:08.9028062Z #47 593.8 | 2025-09-07T09:26:08.9030284Z #47 593.8 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T09:26:08.9033934Z #47 593.8 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:08.9036713Z #47 593.8 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:08.9038238Z #47 593.8 | 2025-09-07T09:26:08.9040140Z #47 593.8 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T09:26:08.9043937Z #47 593.8 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:08.9046887Z #47 593.8 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:08.9048409Z #47 593.8 | 2025-09-07T09:26:08.9050653Z #47 593.8 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T09:26:08.9054728Z #47 593.8 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:08.9057570Z #47 593.8 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:08.9058966Z #47 593.8 | 2025-09-07T09:26:09.4967969Z #47 594.4 [296/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cuda.o 2025-09-07T09:26:09.4986563Z #47 594.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cu(100): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T09:26:09.4989323Z #47 594.4 bool use_swa = window_left != -1; 2025-09-07T09:26:09.4989933Z #47 594.4 ^ 2025-09-07T09:26:09.4990698Z #47 594.4 2025-09-07T09:26:09.4991461Z #47 594.4 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:26:09.4992599Z #47 594.4 2025-09-07T09:26:09.4994939Z #47 594.4 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cu(197): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T09:26:09.4997584Z #47 594.4 bool use_swa = window_left != -1; 2025-09-07T09:26:09.4998188Z #47 594.4 ^ 2025-09-07T09:26:09.4998769Z #47 594.4 2025-09-07T09:26:10.0445577Z #47 595.0 [297/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:26:13.3177463Z #47 598.2 [298/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:26:14.0057078Z #47 598.9 [299/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:26:14.9794138Z #47 599.9 [300/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:26:15.3594024Z #47 600.3 [301/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:26:16.1577674Z #47 601.1 [302/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu -o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cuda.o 2025-09-07T09:26:16.1596720Z #47 601.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T09:26:16.1600736Z #47 601.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:16.1606227Z #47 601.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:16.1607665Z #47 601.1 | 2025-09-07T09:26:16.1609978Z #47 601.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T09:26:16.1614505Z #47 601.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:16.1617740Z #47 601.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:16.1619171Z #47 601.1 | 2025-09-07T09:26:16.1621485Z #47 601.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T09:26:16.1625635Z #47 601.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:16.1628699Z #47 601.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:16.1630114Z #47 601.1 | 2025-09-07T09:26:16.1632416Z #47 601.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T09:26:16.1636449Z #47 601.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:16.1639553Z #47 601.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:16.1641005Z #47 601.1 | 2025-09-07T09:26:16.1643315Z #47 601.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T09:26:16.1647249Z #47 601.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:16.1650370Z #47 601.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:16.1651788Z #47 601.1 | 2025-09-07T09:26:16.1654297Z #47 601.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cu:19: 2025-09-07T09:26:16.1658311Z #47 601.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:16.1661653Z #47 601.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:16.1663060Z #47 601.1 | 2025-09-07T09:26:16.3997862Z #47 601.3 [303/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:26:16.6992725Z #47 601.6 [304/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cu -o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cuda.o 2025-09-07T09:26:16.7010964Z #47 601.6 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cu:19: 2025-09-07T09:26:16.7015431Z #47 601.6 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:16.7018478Z #47 601.6 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:16.7020087Z #47 601.6 | 2025-09-07T09:26:16.7022489Z #47 601.6 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cu:19: 2025-09-07T09:26:16.7026583Z #47 601.6 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:16.7029747Z #47 601.6 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:16.7031310Z #47 601.6 | 2025-09-07T09:26:17.2340261Z #47 602.1 [305/412] c++ logging/logging.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o logging/logging.so 2025-09-07T09:26:17.4302763Z #47 602.2 [306/412] c++ -MMD -MF trtllm_utils/stringUtils.o.d -DTORCH_EXTENSION_NAME=trtllm_utils -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -I/workspace/flashinfer/csrc/nv_internal -I/workspace/flashinfer/csrc/nv_internal/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include -fPIC -O3 -std=c++17 -Wno-switch-bool -c /workspace/flashinfer/csrc/nv_internal/cpp/common/stringUtils.cpp -o trtllm_utils/stringUtils.o 2025-09-07T09:26:17.5180803Z #47 602.4 [307/412] c++ -MMD -MF trtllm_utils/tllmException.o.d -DTORCH_EXTENSION_NAME=trtllm_utils -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -I/workspace/flashinfer/csrc/nv_internal -I/workspace/flashinfer/csrc/nv_internal/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include -fPIC -O3 -std=c++17 -Wno-switch-bool -c /workspace/flashinfer/csrc/nv_internal/cpp/common/tllmException.cpp -o trtllm_utils/tllmException.o 2025-09-07T09:26:17.7380600Z #47 602.5 [308/412] c++ single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T09:26:18.2656801Z #47 603.2 [309/412] c++ single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T09:26:18.4345212Z #47 603.3 [310/412] c++ batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T09:26:18.6452829Z #47 603.4 [311/412] c++ batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T09:26:18.6472961Z #47 603.6 [312/412] c++ single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T09:26:19.3466160Z #47 604.3 [313/412] c++ batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T09:26:19.4609573Z #47 604.3 [314/412] c++ batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T09:26:19.4632973Z #47 604.4 [315/412] c++ single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T09:26:19.6666468Z #47 604.4 [316/412] c++ single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T09:26:20.3484865Z #47 605.3 [317/412] c++ single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T09:26:20.5217545Z #47 605.4 [318/412] c++ batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T09:26:20.6785106Z #47 605.5 [319/412] c++ batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T09:26:20.6806764Z #47 605.6 [320/412] c++ single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T09:26:21.0109014Z #47 605.9 [321/412] c++ -MMD -MF trtllm_utils/logger.o.d -DTORCH_EXTENSION_NAME=trtllm_utils -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -I/workspace/flashinfer/csrc/nv_internal -I/workspace/flashinfer/csrc/nv_internal/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include -fPIC -O3 -std=c++17 -Wno-switch-bool -c /workspace/flashinfer/csrc/nv_internal/cpp/common/logger.cpp -o trtllm_utils/logger.o 2025-09-07T09:26:21.3533301Z #47 606.3 [322/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:26:21.5043131Z #47 606.3 [323/412] c++ batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T09:26:21.5068687Z #47 606.4 [324/412] c++ -MMD -MF trtllm_utils/envUtils.o.d -DTORCH_EXTENSION_NAME=trtllm_utils -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -I/workspace/flashinfer/csrc/nv_internal -I/workspace/flashinfer/csrc/nv_internal/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include -fPIC -O3 -std=c++17 -Wno-switch-bool -c /workspace/flashinfer/csrc/nv_internal/cpp/common/envUtils.cpp -o trtllm_utils/envUtils.o 2025-09-07T09:26:21.7183025Z #47 606.4 [325/412] c++ batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T09:26:21.7199860Z #47 606.5 [326/412] c++ single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T09:26:21.8924586Z #47 606.8 [327/412] c++ single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T09:26:22.2260140Z #47 607.1 [328/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:26:22.3926053Z #47 607.2 [329/412] c++ batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T09:26:22.4994651Z #47 607.4 [330/412] c++ single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T09:26:22.6880868Z #47 607.6 [331/412] c++ single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T09:26:22.7898166Z #47 607.7 [332/412] c++ batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T09:26:23.0009657Z #47 607.8 [333/412] c++ batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T09:26:23.0033280Z #47 607.9 [334/412] c++ single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T09:26:23.1717278Z #47 607.9 [335/412] c++ batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T09:26:23.2649738Z #47 608.2 [336/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu -o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cuda.o 2025-09-07T09:26:23.2668336Z #47 608.2 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T09:26:23.2672184Z #47 608.2 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:23.2675512Z #47 608.2 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:23.2676894Z #47 608.2 | 2025-09-07T09:26:23.2679060Z #47 608.2 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T09:26:23.2682843Z #47 608.2 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:23.2685852Z #47 608.2 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:23.2687210Z #47 608.2 | 2025-09-07T09:26:23.2689457Z #47 608.2 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T09:26:23.2693669Z #47 608.2 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:23.2696721Z #47 608.2 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:23.2698049Z #47 608.2 | 2025-09-07T09:26:23.2700227Z #47 608.2 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T09:26:23.2704113Z #47 608.2 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:23.2707181Z #47 608.2 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:23.2708875Z #47 608.2 | 2025-09-07T09:26:23.2711101Z #47 608.2 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T09:26:23.2715043Z #47 608.2 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:23.2718050Z #47 608.2 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:23.2719608Z #47 608.2 | 2025-09-07T09:26:23.2722044Z #47 608.2 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cu:16: 2025-09-07T09:26:23.2726039Z #47 608.2 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:23.2728938Z #47 608.2 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:23.2730252Z #47 608.2 | 2025-09-07T09:26:23.5082818Z #47 608.4 [337/412] c++ batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T09:26:23.6577549Z #47 608.4 [338/412] c++ batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T09:26:23.6599700Z #47 608.6 [339/412] c++ single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T09:26:23.7625552Z #47 608.7 [340/412] c++ single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T09:26:23.9167770Z #47 608.8 [341/412] c++ single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_0.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_1.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_2.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_kernel_mask_3.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill.cuda.o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T09:26:24.2293291Z #47 609.1 [342/412] c++ single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_kernel.cuda.o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode.cuda.o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T09:26:24.4100342Z #47 609.2 [343/412] c++ batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so 2025-09-07T09:26:24.4127106Z #47 609.3 [344/412] c++ batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_kernel.cuda.o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode.cuda.o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so 2025-09-07T09:26:24.5673098Z #47 609.5 [345/412] c++ batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so 2025-09-07T09:26:24.7655211Z #47 609.5 [346/412] c++ batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so 2025-09-07T09:26:24.7760541Z #47 609.7 [347/412] c++ batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so 2025-09-07T09:26:24.9310604Z #47 609.7 [348/412] c++ batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cuda.o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so 2025-09-07T09:26:25.0975539Z #47 610.0 [349/412] c++ batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so 2025-09-07T09:26:25.2120589Z #47 610.1 [350/412] c++ batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_fp8_sm90.cuda.o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so 2025-09-07T09:26:25.2159103Z #47 610.1 [351/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cuda.o 2025-09-07T09:26:25.2178701Z #47 610.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cu(100): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T09:26:25.2181637Z #47 610.1 bool use_swa = window_left != -1; 2025-09-07T09:26:25.2182299Z #47 610.1 ^ 2025-09-07T09:26:25.2182752Z #47 610.1 2025-09-07T09:26:25.2183712Z #47 610.1 Remark: The warnings can be suppressed with "-diag-suppress " 2025-09-07T09:26:25.2184717Z #47 610.1 2025-09-07T09:26:25.2187273Z #47 610.1 /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cu(197): warning #177-D: variable "use_swa" was declared but never referenced 2025-09-07T09:26:25.2190240Z #47 610.1 bool use_swa = window_left != -1; 2025-09-07T09:26:25.2190903Z #47 610.1 ^ 2025-09-07T09:26:25.2191317Z #47 610.1 2025-09-07T09:26:25.4817452Z #47 610.4 [352/412] c++ batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_.so 2025-09-07T09:26:25.7264354Z #47 610.5 [353/412] c++ batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90.so 2025-09-07T09:26:25.7944121Z #47 610.7 [354/412] c++ batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_.so 2025-09-07T09:26:25.9759294Z #47 610.7 [355/412] c++ batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_.so 2025-09-07T09:26:26.1096705Z #47 611.0 [356/412] c++ batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_paged_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_ragged_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_.so 2025-09-07T09:26:28.2751518Z #47 613.2 [357/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90_jit_pybind.cuda.o 2025-09-07T09:26:32.3552033Z #47 617.3 [358/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:26:33.5136885Z #47 618.4 [359/412] c++ batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90.so 2025-09-07T09:26:33.6558626Z #47 618.6 [360/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cu -o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cuda.o 2025-09-07T09:26:33.6577044Z #47 618.6 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cu:21: 2025-09-07T09:26:33.6581057Z #47 618.6 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:33.6584484Z #47 618.6 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:33.6585859Z #47 618.6 | 2025-09-07T09:26:33.6588299Z #47 618.6 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cu:21: 2025-09-07T09:26:33.6592526Z #47 618.6 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:33.6595822Z #47 618.6 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:33.6597255Z #47 618.6 | 2025-09-07T09:26:34.7363214Z #47 619.6 [361/412] c++ batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cuda.o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cuda.o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90.so 2025-09-07T09:26:42.1553264Z #47 627.1 [362/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu -o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cuda.o 2025-09-07T09:26:42.1571110Z #47 627.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T09:26:42.1574893Z #47 627.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:42.1579713Z #47 627.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:42.1581023Z #47 627.1 | 2025-09-07T09:26:42.1583237Z #47 627.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T09:26:42.1587137Z #47 627.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:42.1590083Z #47 627.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:42.1591461Z #47 627.1 | 2025-09-07T09:26:42.1593895Z #47 627.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T09:26:42.1597628Z #47 627.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:42.1600595Z #47 627.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:42.1602000Z #47 627.1 | 2025-09-07T09:26:42.1604169Z #47 627.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T09:26:42.1607931Z #47 627.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:42.1610820Z #47 627.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:42.1612245Z #47 627.1 | 2025-09-07T09:26:42.1614397Z #47 627.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T09:26:42.1618081Z #47 627.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:42.1621280Z #47 627.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:42.1622697Z #47 627.1 | 2025-09-07T09:26:42.1624846Z #47 627.1 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T09:26:42.1628819Z #47 627.1 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:42.1631959Z #47 627.1 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:42.1633337Z #47 627.1 | 2025-09-07T09:26:43.0349695Z #47 627.9 [363/412] c++ batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cuda.o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cuda.o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False.so 2025-09-07T09:26:43.7487345Z #47 628.7 [364/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cu -o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cuda.o 2025-09-07T09:26:43.8980735Z #47 628.7 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cu:16: 2025-09-07T09:26:43.8985092Z #47 628.7 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:43.8988067Z #47 628.7 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:43.8989341Z #47 628.7 | 2025-09-07T09:26:43.8991852Z #47 628.7 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cu:16: 2025-09-07T09:26:43.8996056Z #47 628.7 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:43.8999124Z #47 628.7 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:43.9000409Z #47 628.7 | 2025-09-07T09:26:44.0116662Z #47 628.9 [365/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cu -o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cuda.o 2025-09-07T09:26:44.0135324Z #47 628.9 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cu:21: 2025-09-07T09:26:44.0140857Z #47 628.9 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:44.0143976Z #47 628.9 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:44.0145442Z #47 628.9 | 2025-09-07T09:26:44.0147811Z #47 628.9 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cu:21: 2025-09-07T09:26:44.0152058Z #47 628.9 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:44.0155242Z #47 628.9 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:44.0156834Z #47 628.9 | 2025-09-07T09:26:44.6353503Z #47 629.5 [366/412] c++ batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_plan.cuda.o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_run.cuda.o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_sm90_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90.so 2025-09-07T09:26:46.1897700Z #47 631.1 [367/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output mla/flashinfer_mla_ops.cuda.o.d -DTORCH_EXTENSION_NAME=mla -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_100a,code=sm_100a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/csrc/flashinfer_mla_ops.cu -o mla/flashinfer_mla_ops.cuda.o 2025-09-07T09:26:46.3679893Z #47 631.1 [368/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output cascade/flashinfer_cascade_ops.cuda.o.d -DTORCH_EXTENSION_NAME=cascade -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/flashinfer_cascade_ops.cu -o cascade/flashinfer_cascade_ops.cuda.o 2025-09-07T09:26:48.1410994Z #47 633.1 [369/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output norm/flashinfer_norm_ops.cuda.o.d -DTORCH_EXTENSION_NAME=norm -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/flashinfer_norm_ops.cu -o norm/flashinfer_norm_ops.cuda.o 2025-09-07T09:26:48.6291709Z #47 633.5 [370/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output page/flashinfer_page_ops.cuda.o.d -DTORCH_EXTENSION_NAME=page -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/flashinfer_page_ops.cu -o page/flashinfer_page_ops.cuda.o 2025-09-07T09:26:49.3079002Z #47 634.2 [371/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output cascade/cascade.cuda.o.d -DTORCH_EXTENSION_NAME=cascade -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/cascade.cu -o cascade/cascade.cuda.o 2025-09-07T09:26:49.4631151Z #47 634.2 [372/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output quantization/flashinfer_quantization_ops.cuda.o.d -DTORCH_EXTENSION_NAME=quantization -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/flashinfer_quantization_ops.cu -o quantization/flashinfer_quantization_ops.cuda.o 2025-09-07T09:26:49.5778712Z #47 634.5 [373/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cuda.o.d -DTORCH_EXTENSION_NAME=batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu -o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cuda.o 2025-09-07T09:26:49.5798345Z #47 634.5 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T09:26:49.5802268Z #47 634.5 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:49.5805403Z #47 634.5 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:49.5806759Z #47 634.5 | 2025-09-07T09:26:49.5809013Z #47 634.5 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T09:26:49.5813417Z #47 634.5 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:49.5816505Z #47 634.5 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:49.5818140Z #47 634.5 | 2025-09-07T09:26:49.5820411Z #47 634.5 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T09:26:49.5826480Z #47 634.5 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:49.5829536Z #47 634.5 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:49.5831125Z #47 634.5 | 2025-09-07T09:26:49.5834020Z #47 634.5 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T09:26:49.5837889Z #47 634.5 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:49.5840970Z #47 634.5 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:49.5842329Z #47 634.5 | 2025-09-07T09:26:49.5844544Z #47 634.5 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T09:26:49.5848497Z #47 634.5 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:49.5851738Z #47 634.5 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:49.5853333Z #47 634.5 | 2025-09-07T09:26:49.5855428Z #47 634.5 In file included from /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cu:21: 2025-09-07T09:26:49.5858928Z #47 634.5 /workspace/flashinfer/build/aot/generated/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_config.inc:29:111: warning: backslash-newline at end of file 2025-09-07T09:26:49.5862150Z #47 634.5 29 | #define DISPATCH_context(DTypeQ, DTypeKV, DTypeO, IdType, MASK_MODE, HEAD_DIM_CKV, HEAD_DIM_KPE, Params, ...) \ 2025-09-07T09:26:49.5863603Z #47 634.5 | 2025-09-07T09:26:49.9139358Z #47 634.8 [374/412] c++ cascade/cascade.cuda.o cascade/flashinfer_cascade_ops.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o cascade/cascade.so 2025-09-07T09:26:50.2034445Z #47 635.1 [375/412] c++ batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_plan.cuda.o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_run.cuda.o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False.so 2025-09-07T09:26:50.6409772Z #47 635.6 [376/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output page/page.cuda.o.d -DTORCH_EXTENSION_NAME=page -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/page.cu -o page/page.cuda.o 2025-09-07T09:26:51.2631163Z #47 636.2 [377/412] c++ page/page.cuda.o page/flashinfer_page_ops.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o page/page.so 2025-09-07T09:26:51.5066623Z #47 636.3 [378/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:26:52.7772492Z #47 637.7 [379/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output rope/flashinfer_rope_ops.cuda.o.d -DTORCH_EXTENSION_NAME=rope -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/flashinfer_rope_ops.cu -o rope/flashinfer_rope_ops.cuda.o 2025-09-07T09:26:52.9545311Z #47 637.7 [380/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output quantization/quantization.cuda.o.d -DTORCH_EXTENSION_NAME=quantization -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/quantization.cu -o quantization/quantization.cuda.o 2025-09-07T09:26:53.1403695Z #47 638.1 [381/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:26:53.2592631Z #47 638.2 [382/412] c++ quantization/quantization.cuda.o quantization/flashinfer_quantization_ops.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o quantization/quantization.so 2025-09-07T09:26:53.7167224Z #47 638.6 [383/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:26:53.9959186Z #47 638.9 [384/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output norm/norm.cuda.o.d -DTORCH_EXTENSION_NAME=norm -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/norm.cu -o norm/norm.cuda.o 2025-09-07T09:26:54.2304123Z #47 639.0 [385/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output sampling/flashinfer_sampling_ops.cuda.o.d -DTORCH_EXTENSION_NAME=sampling -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/flashinfer_sampling_ops.cu -o sampling/flashinfer_sampling_ops.cuda.o 2025-09-07T09:26:54.4799862Z #47 639.4 [386/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:26:54.5907249Z #47 639.5 [387/412] c++ norm/norm.cuda.o norm/flashinfer_norm_ops.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o norm/norm.so 2025-09-07T09:26:55.2110300Z #47 640.1 [388/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output trtllm_utils/delayStream.cuda.o.d -DTORCH_EXTENSION_NAME=trtllm_utils -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -I/workspace/flashinfer/csrc/nv_internal -I/workspace/flashinfer/csrc/nv_internal/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include -I/workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/nv_internal/tensorrt_llm/kernels/delayStream.cu -o trtllm_utils/delayStream.cuda.o 2025-09-07T09:26:55.3574172Z #47 640.3 [389/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:26:55.6064051Z #47 640.5 [390/412] c++ trtllm_utils/delayStream.cuda.o trtllm_utils/envUtils.o trtllm_utils/logger.o trtllm_utils/stringUtils.o trtllm_utils/tllmException.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o trtllm_utils/trtllm_utils.so 2025-09-07T09:26:56.2248991Z #47 641.1 [391/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:26:58.3890774Z #47 643.3 [392/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:26:58.6266029Z #47 643.4 [393/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:26:58.9261263Z #47 643.8 [394/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:27:00.0351363Z #47 644.9 [395/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:27:00.5284534Z #47 645.4 [396/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o 2025-09-07T09:27:02.3249280Z #47 647.2 [397/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:27:04.0299714Z #47 648.9 [398/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o 2025-09-07T09:27:05.2703454Z #47 650.2 [399/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o 2025-09-07T09:27:05.8421592Z #47 650.8 [400/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:27:06.1748204Z #47 651.1 [401/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output sampling/renorm.cuda.o.d -DTORCH_EXTENSION_NAME=sampling -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/renorm.cu -o sampling/renorm.cuda.o 2025-09-07T09:27:06.4096529Z #47 651.2 [402/412] c++ batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90.so 2025-09-07T09:27:09.9645154Z #47 654.9 [403/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_90a,code=sm_90a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/build/aot/generated/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_3.cu -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o 2025-09-07T09:27:10.3756413Z #47 655.3 [404/412] c++ batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_0.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_1.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_2.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_paged_sm90_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_ragged_sm90_kernel_mask_3.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90.cuda.o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_sm90_jit_pybind.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90.so 2025-09-07T09:27:12.8351220Z #47 657.7 [405/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output rope/rope.cuda.o.d -DTORCH_EXTENSION_NAME=rope -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/rope.cu -o rope/rope.cuda.o 2025-09-07T09:27:13.2606689Z #47 658.2 [406/412] c++ rope/rope.cuda.o rope/flashinfer_rope_ops.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o rope/rope.so 2025-09-07T09:27:34.6950057Z #47 679.6 [407/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output sampling/sampling.cuda.o.d -DTORCH_EXTENSION_NAME=sampling -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -gencode=arch=compute_100,code=sm_100 -gencode=arch=compute_120,code=sm_120 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -c /workspace/flashinfer/csrc/sampling.cu -o sampling/sampling.cuda.o 2025-09-07T09:27:35.1214281Z #47 680.0 [408/412] c++ sampling/sampling.cuda.o sampling/renorm.cuda.o sampling/flashinfer_sampling_ops.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o sampling/sampling.so 2025-09-07T09:27:43.4205346Z #47 688.3 [409/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output fmha_cutlass_sm100a/fmha_cutlass_sm100.cuda.o.d -DTORCH_EXTENSION_NAME=fmha_cutlass_sm100a -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_100a,code=sm_100a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/csrc/fmha_cutlass_sm100.cu -o fmha_cutlass_sm100a/fmha_cutlass_sm100.cuda.o 2025-09-07T09:27:43.7331986Z #47 688.6 [410/412] c++ fmha_cutlass_sm100a/fmha_cutlass_sm100.cuda.o fmha_cutlass_sm100a/fmha_cutlass_sm100_pybind.cuda.o fmha_cutlass_sm100a/blackwell_fmha_plan.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o fmha_cutlass_sm100a/fmha_cutlass_sm100a.so 2025-09-07T09:27:43.8583448Z #47 688.8 [411/412] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output mla/cutlass_mla.cuda.o.d -DTORCH_EXTENSION_NAME=mla -DTORCH_API_INCLUDE_EXTENSION_H -DPy_LIMITED_API=0x03090000 -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /opt/python/cp312-cp312/include/python3.12 -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include -isystem /opt/python/cp312-cp312/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -isystem /workspace/flashinfer/include -isystem /workspace/flashinfer/csrc -isystem /workspace/flashinfer/3rdparty/cutlass/include -isystem /workspace/flashinfer/3rdparty/cutlass/tools/util/include -isystem /workspace/flashinfer/3rdparty/spdlog/include --compiler-options=-fPIC --expt-relaxed-constexpr -static-global-template-stub=false -O3 -std=c++17 --threads=32 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -DNDEBUG -gencode=arch=compute_100a,code=sm_100a -DFLASHINFER_ENABLE_FP8_E8M0 -DFLASHINFER_ENABLE_FP4_E2M1 -c /workspace/flashinfer/csrc/cutlass_mla.cu -o mla/cutlass_mla.cuda.o 2025-09-07T09:27:44.1458810Z #47 689.1 [412/412] c++ mla/cutlass_mla.cuda.o mla/flashinfer_mla_ops.cuda.o -shared -L/opt/python/cp312-cp312/lib/python3.12/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -lcudart -o mla/mla.so 2025-09-07T09:27:44.4374978Z #47 689.4 AOT kernels saved to: /workspace/flashinfer/aot-ops 2025-09-07T09:27:44.8536333Z #47 689.8 * Getting build dependencies for wheel... 2025-09-07T09:27:44.9957562Z #47 689.9 60 AOT ops found in /workspace/flashinfer/aot-ops 2025-09-07T09:27:45.1286459Z #47 690.0 * Building wheel... 2025-09-07T09:27:46.7759041Z #47 691.7 W0907 09:27:46.774000 22959 /opt/_internal/cpython-3.12.11/lib/python3.12/site-packages/torch/utils/cpp_extension.py:117] No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' 2025-09-07T09:27:47.0801907Z #47 692.0 60 AOT ops found in /workspace/flashinfer/aot-ops 2025-09-07T09:27:47.0802369Z #47 692.0 running bdist_wheel 2025-09-07T09:27:47.2174645Z #47 692.0 running build 2025-09-07T09:27:47.2175079Z #47 692.0 running build_py 2025-09-07T09:27:47.2175575Z #47 692.1 creating build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2176263Z #47 692.1 copying flashinfer/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2177306Z #47 692.1 copying flashinfer/__main__.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2178261Z #47 692.1 copying flashinfer/activation.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2179078Z #47 692.1 copying flashinfer/aot.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2180192Z #47 692.1 copying flashinfer/artifacts.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2180984Z #47 692.1 copying flashinfer/attention.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2181762Z #47 692.1 copying flashinfer/autotuner.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2182513Z #47 692.1 copying flashinfer/cascade.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2183432Z #47 692.1 copying flashinfer/cuda_utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2184176Z #47 692.1 copying flashinfer/decode.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2185258Z #47 692.1 copying flashinfer/deep_gemm.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2186076Z #47 692.1 copying flashinfer/fp4_quantization.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2186885Z #47 692.1 copying flashinfer/fp8_quantization.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2187641Z #47 692.1 copying flashinfer/gemm.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2188346Z #47 692.1 copying flashinfer/green_ctx.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2189054Z #47 692.1 copying flashinfer/mla.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2189870Z #47 692.1 copying flashinfer/norm.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2190611Z #47 692.1 copying flashinfer/page.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2191376Z #47 692.1 copying flashinfer/pod.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2192811Z #47 692.1 copying flashinfer/prefill.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2193606Z #47 692.1 copying flashinfer/quantization.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2194412Z #47 692.1 copying flashinfer/rope.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2195323Z #47 692.1 copying flashinfer/sampling.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2196096Z #47 692.1 copying flashinfer/sparse.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2197179Z #47 692.1 copying flashinfer/tllm_utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2197939Z #47 692.1 copying flashinfer/utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2198688Z #47 692.1 copying flashinfer/_build_meta.py -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.2199371Z #47 692.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/cute_dsl 2025-09-07T09:27:47.2200195Z #47 692.1 copying flashinfer/cute_dsl/blockscaled_gemm.py -> build/lib.linux-x86_64-cpython-312/flashinfer/cute_dsl 2025-09-07T09:27:47.2201132Z #47 692.1 copying flashinfer/cute_dsl/utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer/cute_dsl 2025-09-07T09:27:47.2202010Z #47 692.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/data 2025-09-07T09:27:47.2202753Z #47 692.1 copying ./custom_backend.py -> build/lib.linux-x86_64-cpython-312/flashinfer/data 2025-09-07T09:27:47.2203541Z #47 692.1 copying ./setup.py -> build/lib.linux-x86_64-cpython-312/flashinfer/data 2025-09-07T09:27:47.2204269Z #47 692.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/fused_moe 2025-09-07T09:27:47.2205138Z #47 692.1 copying flashinfer/fused_moe/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/fused_moe 2025-09-07T09:27:47.2206501Z #47 692.1 copying flashinfer/fused_moe/core.py -> build/lib.linux-x86_64-cpython-312/flashinfer/fused_moe 2025-09-07T09:27:47.2207473Z #47 692.1 copying flashinfer/fused_moe/utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer/fused_moe 2025-09-07T09:27:47.2208190Z #47 692.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/jit 2025-09-07T09:27:47.2208869Z #47 692.1 copying flashinfer/jit/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit 2025-09-07T09:27:47.2209675Z #47 692.1 copying flashinfer/jit/activation.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit 2025-09-07T09:27:47.2210655Z #47 692.1 copying flashinfer/jit/core.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit 2025-09-07T09:27:47.2211434Z #47 692.1 copying flashinfer/jit/cpp_ext.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit 2025-09-07T09:27:47.2212606Z #47 692.1 copying flashinfer/jit/cubin_loader.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit 2025-09-07T09:27:47.2213704Z #47 692.1 copying flashinfer/jit/env.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit 2025-09-07T09:27:47.2214539Z #47 692.1 copying flashinfer/jit/utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit 2025-09-07T09:27:47.2215374Z #47 692.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention 2025-09-07T09:27:47.2216211Z #47 692.1 copying flashinfer/jit/attention/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention 2025-09-07T09:27:47.2217408Z #47 692.1 copying flashinfer/jit/attention/pytorch.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention 2025-09-07T09:27:47.2218482Z #47 692.1 copying flashinfer/jit/attention/tvm.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention 2025-09-07T09:27:47.2219705Z #47 692.1 copying flashinfer/jit/attention/utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention 2025-09-07T09:27:47.2220752Z #47 692.1 copying flashinfer/jit/attention/variants.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention 2025-09-07T09:27:47.2222095Z #47 692.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/jit/cutlass_gemm 2025-09-07T09:27:47.2222982Z #47 692.1 copying flashinfer/jit/cutlass_gemm/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit/cutlass_gemm 2025-09-07T09:27:47.2224263Z #47 692.1 copying flashinfer/jit/cutlass_gemm/cutlass_library.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit/cutlass_gemm 2025-09-07T09:27:47.2225598Z #47 692.1 copying flashinfer/jit/cutlass_gemm/generate_kernels.py -> build/lib.linux-x86_64-cpython-312/flashinfer/jit/cutlass_gemm 2025-09-07T09:27:47.2226601Z #47 692.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/testing 2025-09-07T09:27:47.2227329Z #47 692.1 copying flashinfer/testing/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/testing 2025-09-07T09:27:47.2228448Z #47 692.1 copying flashinfer/testing/utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer/testing 2025-09-07T09:27:47.2229246Z #47 692.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/triton 2025-09-07T09:27:47.2230039Z #47 692.1 copying flashinfer/triton/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton 2025-09-07T09:27:47.2231248Z #47 692.1 copying flashinfer/triton/activation.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton 2025-09-07T09:27:47.2232268Z #47 692.1 copying flashinfer/triton/cascade.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton 2025-09-07T09:27:47.2233331Z #47 692.1 copying flashinfer/triton/gemm.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton 2025-09-07T09:27:47.2234164Z #47 692.1 copying flashinfer/triton/norm.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton 2025-09-07T09:27:47.2235404Z #47 692.1 copying flashinfer/triton/page.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton 2025-09-07T09:27:47.2236328Z #47 692.1 copying flashinfer/triton/sm_constraint_gemm.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton 2025-09-07T09:27:47.2237586Z #47 692.1 copying flashinfer/triton/utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton 2025-09-07T09:27:47.2238389Z #47 692.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/tuning_configs 2025-09-07T09:27:47.2239552Z #47 692.1 copying flashinfer/tuning_configs/v0_1_trtllm_fused_moe_NVIDIA_B200.py -> build/lib.linux-x86_64-cpython-312/flashinfer/tuning_configs 2025-09-07T09:27:47.2240453Z #47 692.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/profiler 2025-09-07T09:27:47.2241316Z #47 692.1 copying flashinfer/profiler/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/profiler 2025-09-07T09:27:47.2242142Z #47 692.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels 2025-09-07T09:27:47.2243177Z #47 692.1 copying flashinfer/triton/kernels/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels 2025-09-07T09:27:47.2244500Z #47 692.1 copying flashinfer/triton/kernels/activation.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels 2025-09-07T09:27:47.2245678Z #47 692.1 copying flashinfer/triton/kernels/cascade.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels 2025-09-07T09:27:47.2246689Z #47 692.1 copying flashinfer/triton/kernels/norm.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels 2025-09-07T09:27:47.2247729Z #47 692.1 copying flashinfer/triton/kernels/quant.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels 2025-09-07T09:27:47.2248802Z #47 692.1 copying flashinfer/triton/kernels/sm_constraint_gemm.py -> build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels 2025-09-07T09:27:47.2249729Z #47 692.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T09:27:47.2250483Z #47 692.1 copying flashinfer/comm/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T09:27:47.2251378Z #47 692.1 copying flashinfer/comm/cuda_ipc.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T09:27:47.2252672Z #47 692.1 copying flashinfer/comm/dlpack_utils.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T09:27:47.2253725Z #47 692.1 copying flashinfer/comm/mapping.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T09:27:47.2254894Z #47 692.1 copying flashinfer/comm/mnnvl.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T09:27:47.2255716Z #47 692.1 copying flashinfer/comm/nvshmem.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T09:27:47.2256688Z #47 692.1 copying flashinfer/comm/nvshmem_allreduce.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T09:27:47.2257705Z #47 692.1 copying flashinfer/comm/trtllm_alltoall.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T09:27:47.2258781Z #47 692.1 copying flashinfer/comm/trtllm_ar.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T09:27:47.2259792Z #47 692.1 copying flashinfer/comm/trtllm_mnnvl_ar.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T09:27:47.2260648Z #47 692.1 copying flashinfer/comm/vllm_ar.py -> build/lib.linux-x86_64-cpython-312/flashinfer/comm 2025-09-07T09:27:47.2261470Z #47 692.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/cudnn 2025-09-07T09:27:47.2262171Z #47 692.1 copying flashinfer/cudnn/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/cudnn 2025-09-07T09:27:47.2263333Z #47 692.1 copying flashinfer/cudnn/decode.py -> build/lib.linux-x86_64-cpython-312/flashinfer/cudnn 2025-09-07T09:27:47.2264301Z #47 692.1 copying flashinfer/cudnn/prefill.py -> build/lib.linux-x86_64-cpython-312/flashinfer/cudnn 2025-09-07T09:27:47.2265218Z #47 692.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T09:27:47.2266198Z #47 692.1 copying flashinfer/logits_processor/__init__.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T09:27:47.2267388Z #47 692.1 copying flashinfer/logits_processor/compiler.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T09:27:47.2268639Z #47 692.1 copying flashinfer/logits_processor/fusion_rules.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T09:27:47.2269744Z #47 692.1 copying flashinfer/logits_processor/legalization.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T09:27:47.2271121Z #47 692.1 copying flashinfer/logits_processor/op.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T09:27:47.2272211Z #47 692.1 copying flashinfer/logits_processor/operators.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T09:27:47.2273342Z #47 692.1 copying flashinfer/logits_processor/pipeline.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T09:27:47.2274492Z #47 692.1 copying flashinfer/logits_processor/processors.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T09:27:47.2275687Z #47 692.1 copying flashinfer/logits_processor/types.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T09:27:47.2276900Z #47 692.1 copying flashinfer/logits_processor/validators.py -> build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor 2025-09-07T09:27:47.2277814Z #47 692.1 copying flashinfer/py.typed -> build/lib.linux-x86_64-cpython-312/flashinfer 2025-09-07T09:27:47.3176762Z #47 692.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3177558Z #47 692.1 copying ./csrc/activation.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3178592Z #47 692.1 copying ./csrc/aot_extension_utils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3179476Z #47 692.1 copying ./csrc/batch_attention.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3180442Z #47 692.1 copying ./csrc/batch_attention_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3181643Z #47 692.1 copying ./csrc/batch_attention_jit_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3182848Z #47 692.1 copying ./csrc/batch_attention_paged_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3183831Z #47 692.1 copying ./csrc/batch_decode.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3184939Z #47 692.1 copying ./csrc/batch_decode_config.inc -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3185928Z #47 692.1 copying ./csrc/batch_decode_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3187020Z #47 692.1 copying ./csrc/batch_decode_jit_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3187996Z #47 692.1 copying ./csrc/batch_decode_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3188930Z #47 692.1 copying ./csrc/batch_decode_mla_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3189964Z #47 692.1 copying ./csrc/batch_decode_mla_cute_sm80.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3190847Z #47 692.1 copying ./csrc/batch_decode_mla_plan.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3191812Z #47 692.1 copying ./csrc/batch_decode_mla_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3193382Z #47 692.1 copying ./csrc/batch_decode_mla_run.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3194454Z #47 692.1 copying ./csrc/batch_mla_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3195385Z #47 692.1 copying ./csrc/batch_mla_plan.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3196228Z #47 692.1 copying ./csrc/batch_mla_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3197266Z #47 692.1 copying ./csrc/batch_mla_run.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3198251Z #47 692.1 copying ./csrc/batch_mla_sm90_plan.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3199233Z #47 692.1 copying ./csrc/batch_mla_sm90_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3200179Z #47 692.1 copying ./csrc/batch_mla_sm90_run.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3201026Z #47 692.1 copying ./csrc/batch_prefill.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3202059Z #47 692.1 copying ./csrc/batch_prefill_config.inc -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3203032Z #47 692.1 copying ./csrc/batch_prefill_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3204283Z #47 692.1 copying ./csrc/batch_prefill_fp8_paged_sm90_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3205349Z #47 692.1 copying ./csrc/batch_prefill_fp8_ragged_sm90_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3206323Z #47 692.1 copying ./csrc/batch_prefill_fp8_sm90.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3207260Z #47 692.1 copying ./csrc/batch_prefill_jit_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3208327Z #47 692.1 copying ./csrc/batch_prefill_paged_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3209344Z #47 692.1 copying ./csrc/batch_prefill_paged_sm90_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3210565Z #47 692.1 copying ./csrc/batch_prefill_ragged_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3211849Z #47 692.1 copying ./csrc/batch_prefill_ragged_sm90_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3213085Z #47 692.1 copying ./csrc/batch_prefill_sm90.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3214166Z #47 692.1 copying ./csrc/batch_prefill_sm90_config.inc -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3215317Z #47 692.1 copying ./csrc/batch_prefill_sm90_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3216546Z #47 692.1 copying ./csrc/batch_prefill_sm90_jit_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3217479Z #47 692.1 copying ./csrc/blackwell_fmha_plan.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3218303Z #47 692.1 copying ./csrc/bmm_fp8.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3219367Z #47 692.1 copying ./csrc/cascade.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3220240Z #47 692.1 copying ./csrc/cudnn_sdpa_kernel_launcher.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3221273Z #47 692.1 copying ./csrc/cudnn_sdpa_utils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3222102Z #47 692.1 copying ./csrc/cutlass_mla.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3223134Z #47 692.1 copying ./csrc/flashinfer_cascade_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3224294Z #47 692.1 copying ./csrc/flashinfer_gemm_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3225473Z #47 692.1 copying ./csrc/flashinfer_gemm_sm90_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3226348Z #47 692.1 copying ./csrc/flashinfer_mla_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3227200Z #47 692.1 copying ./csrc/flashinfer_norm_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3228403Z #47 692.1 copying ./csrc/flashinfer_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3229231Z #47 692.1 copying ./csrc/flashinfer_ops_sm90.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3230187Z #47 692.1 copying ./csrc/flashinfer_page_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3231161Z #47 692.1 copying ./csrc/flashinfer_quantization_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3232139Z #47 692.1 copying ./csrc/flashinfer_rope_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3233249Z #47 692.1 copying ./csrc/flashinfer_sampling_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3234157Z #47 692.1 copying ./csrc/fmha_cutlass_sm100.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3235033Z #47 692.1 copying ./csrc/fmha_cutlass_sm100_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3235884Z #47 692.1 copying ./csrc/fp4_gemm_cutlass.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3236725Z #47 692.1 copying ./csrc/fp4_gemm_cutlass.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3237567Z #47 692.1 copying ./csrc/fp8_gemm_cutlass.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3238392Z #47 692.1 copying ./csrc/fp8_gemm_cutlass.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3239245Z #47 692.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fused_moe/cutlass_backend 2025-09-07T09:27:47.3240401Z #47 692.1 copying ./csrc/fused_moe/cutlass_backend/cutlass_fused_moe_instantiation.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fused_moe/cutlass_backend 2025-09-07T09:27:47.3241855Z #47 692.1 copying ./csrc/fused_moe/cutlass_backend/cutlass_fused_moe_kernels.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fused_moe/cutlass_backend 2025-09-07T09:27:47.3243524Z #47 692.1 copying ./csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_ops.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fused_moe/cutlass_backend 2025-09-07T09:27:47.3244949Z #47 692.1 copying ./csrc/gemm_groupwise_sm100.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3246161Z #47 692.1 copying ./csrc/gemm_groupwise_sm100_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3247085Z #47 692.1 copying ./csrc/gemm_sm100_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3247961Z #47 692.1 copying ./csrc/group_gemm.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3249052Z #47 692.1 copying ./csrc/group_gemm_fp8_groupwise_sm100.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3250144Z #47 692.1 copying ./csrc/group_gemm_fp8_groupwise_sm100_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3251465Z #47 692.1 copying ./csrc/group_gemm_mxfp4_groupwise_sm100.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3252872Z #47 692.1 copying ./csrc/group_gemm_mxfp4_groupwise_sm100_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3254265Z #47 692.1 copying ./csrc/group_gemm_sm100_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3255281Z #47 692.1 copying ./csrc/group_gemm_sm90.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3256309Z #47 692.1 copying ./csrc/group_gemm_sm90_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3257192Z #47 692.1 copying ./csrc/logging.cc -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3258073Z #47 692.1 copying ./csrc/norm.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3258869Z #47 692.1 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T09:27:47.3260109Z #47 692.1 copying ./csrc/nv_internal/cpp/common/envUtils.cpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T09:27:47.3261381Z #47 692.1 copying ./csrc/nv_internal/cpp/common/logger.cpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T09:27:47.3262831Z #47 692.1 copying ./csrc/nv_internal/cpp/common/memoryUtils.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T09:27:47.3264342Z #47 692.1 copying ./csrc/nv_internal/cpp/common/stringUtils.cpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T09:27:47.3265893Z #47 692.2 copying ./csrc/nv_internal/cpp/common/tllmException.cpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T09:27:47.3267055Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/kernels 2025-09-07T09:27:47.3280647Z #47 692.2 copying ./csrc/nv_internal/cpp/kernels/quantization.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/kernels 2025-09-07T09:27:47.3282318Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:47.3284000Z #47 692.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/NvInferRuntime.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:47.3285675Z #47 692.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/assert.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:47.3287443Z #47 692.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/cudaBf16Wrapper.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:47.3289237Z #47 692.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/cudaFp8Utils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:47.3290973Z #47 692.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/cudaUtils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:47.3293052Z #47 692.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/dataType.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:47.3294843Z #47 692.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/logger.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:47.3296749Z #47 692.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/quantization.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:47.3298560Z #47 692.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/stringUtils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:47.3300503Z #47 692.2 copying ./csrc/nv_internal/include/tensorrt_llm/common/tllmException.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:47.3301863Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:47.3303427Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/common/cublasMMWrapper.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:47.3305433Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/common/cudaBf16Fallbacks.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:47.3307178Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/common/cudaDriverWrapper.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:47.3308693Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/common/cudaTypeUtils.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:47.3310272Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/common/envUtils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:47.3312008Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/common/memoryUtils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:47.3313876Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/common/quantTypeUtils.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:47.3315398Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/common/reduceKernelUtils.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:47.3317026Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/common/workspace.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:47.3318674Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T09:27:47.3320851Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/copy_red_global.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T09:27:47.3323388Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/copy_sm90_multimem.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T09:27:47.3326407Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/copy_traits_sm90_multimem.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T09:27:47.3329436Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/grid_dependency_control.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T09:27:47.3332090Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T09:27:47.3334714Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/communication/collective 2025-09-07T09:27:47.3337584Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/communication/collective/sm90_allreduce_nvls_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/communication/collective 2025-09-07T09:27:47.3340446Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/compute_occupancy.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T09:27:47.3342396Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/detail/collective 2025-09-07T09:27:47.3344922Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/detail/collective/mixed_input_utils.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/detail/collective 2025-09-07T09:27:47.3347328Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/collective 2025-09-07T09:27:47.3349455Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/collective/epilogue_moe_finalize.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/collective 2025-09-07T09:27:47.3351614Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion 2025-09-07T09:27:47.3353781Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion/sm90_visitor_allreduce_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion 2025-09-07T09:27:47.3355948Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/thread 2025-09-07T09:27:47.3357987Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/thread/fused_activations.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/thread 2025-09-07T09:27:47.3360422Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue_helpers.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T09:27:47.3362319Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders 2025-09-07T09:27:47.3364536Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders/sm90_gmma_builder_gated.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders 2025-09-07T09:27:47.3367428Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders/sm90_gmma_builder_interleaved.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders 2025-09-07T09:27:47.3370352Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders/sm90_gmma_builder_mixed_input.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders 2025-09-07T09:27:47.3373421Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_builder_gated.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:47.3376331Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_builder_interleaved.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:47.3379219Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_builder_mixed_input.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:47.3382047Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_mma_array_mixed_input.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:47.3384845Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_mma_gated.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:47.3387695Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_mma_interleaved.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:47.3390538Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_array_tma_gmma_rs_warpspecialized_mixed_input_.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:47.3393906Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_gated_tma_gmma_ss_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:47.3396844Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_gated_tma_gmma_ss_warpspecialized_fp8.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:47.3399876Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_interleaved_tma_gmma_rs_warpspecialized_mixed_input.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:47.3402160Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:47.3404318Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/default_fpA_intB_traits.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:47.3406915Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fused_moe_kernel.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:47.3409396Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fused_moe_kernel_routine.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:47.3412019Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fused_moe_kernel_traits.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:47.3414928Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/gemm_moe_problem_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:47.3417603Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/gemm_universal_allreduce.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:47.3420947Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/mixed_gemm_B_layout.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:47.3423712Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_cute_util.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:47.3426354Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_cutlass_kernel.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:47.3428852Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_problem_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:47.3431479Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/sm90_gemm_allreduce_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:47.3434243Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/sm90_gemm_allreduce_tma_warpspecialized_pingpong.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:47.3436385Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:47.3438420Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_dq_mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:47.3441038Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_dq_mma_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:47.3443713Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_dq_mma_pipelined.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:47.3446374Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:47.3449011Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_mma_bf16.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:47.3451679Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:47.3454545Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:47.3457299Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_multistage_finegrained.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:47.3460128Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_multistage_percol.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:47.3462827Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_pipelined.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:47.3465660Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_pipelined_finegrained.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:47.3468338Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_pipelined_percol.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:47.3470451Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp 2025-09-07T09:27:47.3472334Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp/default_mma_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp 2025-09-07T09:27:47.3474789Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp/mma_tensorop_compute_B_with_f16.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp 2025-09-07T09:27:47.3477268Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp/mma_tensorop_dequantizer.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp 2025-09-07T09:27:47.3479541Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm_configs.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T09:27:47.3481828Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/interleaved_numeric_conversion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T09:27:47.3484068Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/system_barrier.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T09:27:47.3486295Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/tile_interleaved_layout.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T09:27:47.3488166Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/transform/threadblock 2025-09-07T09:27:47.3490304Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/transform/threadblock/fine_grained_scale_zero_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/transform/threadblock 2025-09-07T09:27:47.3492880Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/util 2025-09-07T09:27:47.3494802Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/util/gather_tensor.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/util 2025-09-07T09:27:47.3497173Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/weight_only_quant_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T09:27:47.3498890Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels 2025-09-07T09:27:47.3500403Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels 2025-09-07T09:27:47.3502279Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels 2025-09-07T09:27:47.3504158Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_type_conversion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels 2025-09-07T09:27:47.3505939Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm 2025-09-07T09:27:47.3507630Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm/fp8_blockscale_gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm 2025-09-07T09:27:47.3509776Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm/fp8_blockscale_gemm_stub.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm 2025-09-07T09:27:47.3511454Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3513072Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_fg_scalebias.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3515227Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_fg_scaleonly.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3517254Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_per_col.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3519326Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_fg_scalebias.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3521381Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_fg_scaleonly.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3523414Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_per_col.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3525519Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scalebias_bf16_out_bf16.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3527660Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scalebias_f16_out_f16.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3529792Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scaleonly_bf16_out_bf16.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3531941Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scaleonly_f16_out_f16.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3534374Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_per_col_f16_out_f16.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3536564Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_fg_scalebias.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3538751Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_fg_scaleonly.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3540916Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_per_col.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3543070Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_fg_scalebias.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3545334Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_fg_scaleonly.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3547349Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_per_col.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3549208Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3551095Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3553013Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template_sm90.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:47.3554582Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers 2025-09-07T09:27:47.3556254Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers/fpA_intB_launcher_sm90.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers 2025-09-07T09:27:47.3558375Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers/fpA_intB_launcher_sm90.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers 2025-09-07T09:27:47.3559968Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T09:27:47.3561342Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/common.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T09:27:47.3563120Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/cutlass_kernel_selector.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T09:27:47.3564923Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/moe_gemm_kernels.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T09:27:47.3566659Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/moe_kernels.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T09:27:47.3568410Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/moe_util_kernels.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T09:27:47.3569871Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T09:27:47.3571468Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/fused_moe_gemm_launcher_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T09:27:47.3573865Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/fused_moe_gemm_launcher_sm80.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T09:27:47.3576216Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_launcher.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T09:27:47.3578534Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_launcher.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T09:27:47.3581756Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_mixed_input_launcher.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T09:27:47.3584169Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_mixed_input_launcher.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T09:27:47.3586495Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_bf16.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3588545Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_fp4.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3590554Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_fp8.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3592917Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_uint4.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3595077Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_uint8.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3597166Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_fp16.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3599543Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_fp4.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3602113Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_uint4.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3604202Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_uint8.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3606354Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp32_fp32.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3608358Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp4_fp4.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3610330Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp8_fp4.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3612544Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp8_fp8.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3614778Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp8_uint4.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3616899Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3619034Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch_tma_ws.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3621892Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch_tma_ws_mixed_dtype.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3624186Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_tma_warp_specialized_input.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3626438Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_tma_warp_specialized_traits.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:47.3628202Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/delayStream.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T09:27:47.3629645Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/delayStream.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T09:27:47.3630844Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora 2025-09-07T09:27:47.3632070Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/lora/lora.cpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora 2025-09-07T09:27:47.3633543Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/lora/lora.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora 2025-09-07T09:27:47.3635046Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/preQuantScaleKernel.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T09:27:47.3636586Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/preQuantScaleKernel.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T09:27:47.3638100Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/quantization.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T09:27:47.3639558Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/kernels/quantization.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T09:27:47.3640733Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/runtime 2025-09-07T09:27:47.3641920Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/runtime/torchUtils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/runtime 2025-09-07T09:27:47.3643066Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T09:27:47.3644220Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/thop/fp4Op.cpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T09:27:47.3645605Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/thop/fp4Quantize.cpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T09:27:47.3646996Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/thop/fp4Quantize.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T09:27:47.3648378Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/thop/fp8Quantize.cpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T09:27:47.3649777Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/thop/fp8Quantize.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T09:27:47.3651121Z #47 692.2 copying ./csrc/nv_internal/tensorrt_llm/thop/thUtils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T09:27:47.3652209Z #47 692.2 copying ./csrc/nvshmem_binding.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3653247Z #47 692.2 copying ./csrc/page.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3653998Z #47 692.2 copying ./csrc/pod.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3654801Z #47 692.2 copying ./csrc/pod_config.inc -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3655680Z #47 692.2 copying ./csrc/pod_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3656568Z #47 692.2 copying ./csrc/pod_jit_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3657407Z #47 692.2 copying ./csrc/pod_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3658322Z #47 692.2 copying ./csrc/pytorch_conversion_utils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3659240Z #47 692.2 copying ./csrc/pytorch_extension_utils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3660121Z #47 692.2 copying ./csrc/quantization.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3660935Z #47 692.2 copying ./csrc/renorm.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3661682Z #47 692.2 copying ./csrc/rope.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3662470Z #47 692.2 copying ./csrc/runtime_utils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3663267Z #47 692.2 copying ./csrc/sampling.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3664085Z #47 692.2 copying ./csrc/single_decode.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3665196Z #47 692.2 copying ./csrc/single_decode_config.inc -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3666102Z #47 692.2 copying ./csrc/single_decode_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3667021Z #47 692.2 copying ./csrc/single_decode_jit_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3667906Z #47 692.2 copying ./csrc/single_decode_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3668760Z #47 692.2 copying ./csrc/single_prefill.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3669591Z #47 692.2 copying ./csrc/single_prefill_config.inc -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3670506Z #47 692.2 copying ./csrc/single_prefill_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3671482Z #47 692.2 copying ./csrc/single_prefill_fp8_sm90.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3672428Z #47 692.2 copying ./csrc/single_prefill_fp8_sm90_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3673372Z #47 692.2 copying ./csrc/single_prefill_jit_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3674280Z #47 692.2 copying ./csrc/single_prefill_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3675151Z #47 692.2 copying ./csrc/single_prefill_sm90.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3676049Z #47 692.2 copying ./csrc/single_prefill_sm90_config.inc -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3677010Z #47 692.2 copying ./csrc/single_prefill_sm90_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3678102Z #47 692.2 copying ./csrc/single_prefill_sm90_jit_pybind.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3678998Z #47 692.2 copying ./csrc/single_prefill_sm90_kernel_inst.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3679824Z #47 692.2 copying ./csrc/trtllm_allreduce.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3680620Z #47 692.2 copying ./csrc/trtllm_allreduce_fusion.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3681445Z #47 692.2 copying ./csrc/trtllm_alltoall.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3682233Z #47 692.2 copying ./csrc/trtllm_batched_gemm_runner.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3683082Z #47 692.2 copying ./csrc/trtllm_fmha_kernel_launcher.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3683913Z #47 692.2 copying ./csrc/trtllm_fused_moe_dev_kernel.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3684786Z #47 692.2 copying ./csrc/trtllm_fused_moe_kernel_launcher.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3685685Z #47 692.2 copying ./csrc/trtllm_fused_moe_routing_deepseek.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3686564Z #47 692.2 copying ./csrc/trtllm_fused_moe_routing_llama4.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3687483Z #47 692.2 copying ./csrc/trtllm_fused_moe_routing_renormalize.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3688346Z #47 692.2 copying ./csrc/trtllm_fused_moe_runner.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3689138Z #47 692.2 copying ./csrc/trtllm_gemm_runner.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3689930Z #47 692.2 copying ./csrc/trtllm_mnnvl_allreduce.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3690754Z #47 692.2 copying ./csrc/trtllm_moe_allreduce_fusion.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3691584Z #47 692.2 copying ./csrc/vllm_custom_all_reduce.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc 2025-09-07T09:27:47.3692761Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3693895Z #47 692.2 copying ./include/flashinfer/activation.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3695027Z #47 692.2 copying ./include/flashinfer/allocator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3696134Z #47 692.2 copying ./include/flashinfer/arch_condition.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3697260Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T09:27:47.3698827Z #47 692.2 copying ./include/flashinfer/attention/blackwell/collective/fmha_common.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T09:27:47.3700753Z #47 692.2 copying ./include/flashinfer/attention/blackwell/collective/fmha_fusion.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T09:27:47.3702791Z #47 692.2 copying ./include/flashinfer/attention/blackwell/collective/sm100_fmha_fwd_epilogue_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T09:27:47.3705236Z #47 692.2 copying ./include/flashinfer/attention/blackwell/collective/sm100_fmha_fwd_mainloop_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T09:27:47.3707166Z #47 692.2 copying ./include/flashinfer/attention/blackwell/collective/sm100_fmha_gen_epilogue_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T09:27:47.3709268Z #47 692.2 copying ./include/flashinfer/attention/blackwell/collective/sm100_fmha_gen_mainloop_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T09:27:47.3711594Z #47 692.2 copying ./include/flashinfer/attention/blackwell/collective/sm100_fmha_load_cpasync_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T09:27:47.3713669Z #47 692.2 copying ./include/flashinfer/attention/blackwell/collective/sm100_fmha_load_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T09:27:47.3715370Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/common 2025-09-07T09:27:47.3716705Z #47 692.2 copying ./include/flashinfer/attention/blackwell/common/pow_2.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/common 2025-09-07T09:27:47.3718047Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/device 2025-09-07T09:27:47.3719393Z #47 692.2 copying ./include/flashinfer/attention/blackwell/device/fmha.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/device 2025-09-07T09:27:47.3721054Z #47 692.2 copying ./include/flashinfer/attention/blackwell/device/sm100_mla.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/device 2025-09-07T09:27:47.3722697Z #47 692.2 copying ./include/flashinfer/attention/blackwell/fmha_cutlass_sm100.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell 2025-09-07T09:27:47.3724009Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T09:27:47.3725379Z #47 692.2 copying ./include/flashinfer/attention/blackwell/kernel/fmha_options.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T09:27:47.3727349Z #47 692.2 copying ./include/flashinfer/attention/blackwell/kernel/fmha_tile_scheduler.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T09:27:47.3728967Z #47 692.2 copying ./include/flashinfer/attention/blackwell/kernel/gather_tensor.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T09:27:47.3730674Z #47 692.2 copying ./include/flashinfer/attention/blackwell/kernel/sm100_fmha_fwd_kernel_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T09:27:47.3732592Z #47 692.2 copying ./include/flashinfer/attention/blackwell/kernel/sm100_fmha_gen_kernel_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T09:27:47.3734740Z #47 692.2 copying ./include/flashinfer/attention/blackwell/kernel/sm100_fmha_mla_reduction.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T09:27:47.3736665Z #47 692.2 copying ./include/flashinfer/attention/blackwell/kernel/sm100_fmha_mla_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T09:27:47.3738634Z #47 692.2 copying ./include/flashinfer/attention/blackwell/kernel/sm100_mla_tile_scheduler.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T09:27:47.3740349Z #47 692.2 copying ./include/flashinfer/attention/blackwell/plan.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell 2025-09-07T09:27:47.3741777Z #47 692.2 copying ./include/flashinfer/attention/cascade.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3743122Z #47 692.2 copying ./include/flashinfer/attention/cutlass_mla.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3744488Z #47 692.2 copying ./include/flashinfer/attention/decode.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3746094Z #47 692.2 copying ./include/flashinfer/attention/decode_mla_cute_sm80.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3747483Z #47 692.2 copying ./include/flashinfer/attention/default_decode_params.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3748884Z #47 692.2 copying ./include/flashinfer/attention/default_prefill_params.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3750213Z #47 692.2 copying ./include/flashinfer/attention/heap.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3751463Z #47 692.2 copying ./include/flashinfer/attention/hopper.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3752528Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:47.3753762Z #47 692.2 copying ./include/flashinfer/attention/hopper/attention_updater.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:47.3755313Z #47 692.2 copying ./include/flashinfer/attention/hopper/block_sparse_gather.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:47.3756830Z #47 692.2 copying ./include/flashinfer/attention/hopper/default_params.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:47.3758499Z #47 692.2 copying ./include/flashinfer/attention/hopper/epilogue.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:47.3759842Z #47 692.2 copying ./include/flashinfer/attention/hopper/kernel_traits.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:47.3761197Z #47 692.2 copying ./include/flashinfer/attention/hopper/mainloop.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:47.3762551Z #47 692.2 copying ./include/flashinfer/attention/hopper/mainloop_mma.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:47.3763941Z #47 692.2 copying ./include/flashinfer/attention/hopper/named_barrier.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:47.3765343Z #47 692.2 copying ./include/flashinfer/attention/hopper/prefill_sm90.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:47.3766516Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T09:27:47.3767821Z #47 692.2 copying ./include/flashinfer/attention/hopper/quantization/epilogue.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T09:27:47.3769521Z #47 692.2 copying ./include/flashinfer/attention/hopper/quantization/kernel_traits.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T09:27:47.3771197Z #47 692.2 copying ./include/flashinfer/attention/hopper/quantization/mainloop_load.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T09:27:47.3773104Z #47 692.2 copying ./include/flashinfer/attention/hopper/quantization/mainloop_mma.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T09:27:47.3775022Z #47 692.2 copying ./include/flashinfer/attention/hopper/quantization/mainloop_sparse_load.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T09:27:47.3777014Z #47 692.2 copying ./include/flashinfer/attention/hopper/quantization/prefill_sm90.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T09:27:47.3778730Z #47 692.2 copying ./include/flashinfer/attention/hopper/sparse_mainloop.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:47.3780306Z #47 692.2 copying ./include/flashinfer/attention/hopper/tile_scheduler.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:47.3781824Z #47 692.2 copying ./include/flashinfer/attention/hopper/utils.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:47.3783325Z #47 692.2 copying ./include/flashinfer/attention/hopper/variant_helper.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:47.3784872Z #47 692.2 copying ./include/flashinfer/attention/hopper/variants.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:47.3786243Z #47 692.2 copying ./include/flashinfer/attention/mask.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3787386Z #47 692.2 copying ./include/flashinfer/attention/mla.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3788548Z #47 692.2 copying ./include/flashinfer/attention/mla_hopper.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3789724Z #47 692.2 copying ./include/flashinfer/attention/mla_params.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3790926Z #47 692.2 copying ./include/flashinfer/attention/persistent.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3792511Z #47 692.2 copying ./include/flashinfer/attention/persistent_template.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3793877Z #47 692.2 copying ./include/flashinfer/attention/pod.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3795184Z #47 692.2 copying ./include/flashinfer/attention/prefill.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3796580Z #47 692.2 copying ./include/flashinfer/attention/scheduler.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3797950Z #47 692.2 copying ./include/flashinfer/attention/state.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3799297Z #47 692.2 copying ./include/flashinfer/attention/variant_helper.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3800650Z #47 692.2 copying ./include/flashinfer/attention/variants.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:47.3801942Z #47 692.2 copying ./include/flashinfer/attention_impl.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3802922Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm 2025-09-07T09:27:47.3803958Z #47 692.2 copying ./include/flashinfer/comm/trtllm_allreduce.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm 2025-09-07T09:27:47.3805443Z #47 692.2 copying ./include/flashinfer/comm/trtllm_allreduce_fusion.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm 2025-09-07T09:27:47.3806655Z #47 692.2 copying ./include/flashinfer/comm/trtllm_alltoall.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm 2025-09-07T09:27:47.3807914Z #47 692.2 copying ./include/flashinfer/comm/trtllm_mnnvl_allreduce.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm 2025-09-07T09:27:47.3809192Z #47 692.2 copying ./include/flashinfer/comm/trtllm_moe_allreduce_fusion.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm 2025-09-07T09:27:47.3810439Z #47 692.2 copying ./include/flashinfer/comm/vllm_custom_all_reduce.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm 2025-09-07T09:27:47.3811561Z #47 692.2 copying ./include/flashinfer/cp_async.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3812686Z #47 692.2 copying ./include/flashinfer/cubin_loader.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3813957Z #47 692.2 copying ./include/flashinfer/cutlass_utils.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3815076Z #47 692.2 copying ./include/flashinfer/exception.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3816156Z #47 692.2 copying ./include/flashinfer/fastdiv.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3817210Z #47 692.2 copying ./include/flashinfer/fp16.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3818274Z #47 692.2 copying ./include/flashinfer/fp4_layout.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3819413Z #47 692.2 copying ./include/flashinfer/frag_layout_swizzle.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3820406Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:47.3821376Z #47 692.2 copying ./include/flashinfer/gemm/bmm_fp8.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:47.3822606Z #47 692.2 copying ./include/flashinfer/gemm/cutlass_gemm_configs.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:47.3823869Z #47 692.2 copying ./include/flashinfer/gemm/fp4_gemm_cutlass.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:47.3825336Z #47 692.2 copying ./include/flashinfer/gemm/fp4_gemm_cutlass_template.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:47.3826614Z #47 692.2 copying ./include/flashinfer/gemm/fp4_gemm_template_sm100.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:47.3827838Z #47 692.2 copying ./include/flashinfer/gemm/fp8_gemm_cutlass.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:47.3829026Z #47 692.2 copying ./include/flashinfer/gemm/fp8_gemm_cutlass_template.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:47.3830259Z #47 692.2 copying ./include/flashinfer/gemm/fp8_gemm_template_sm100.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:47.3831517Z #47 692.2 copying ./include/flashinfer/gemm/gemm_groupwise_sm100.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:47.3832688Z #47 692.2 copying ./include/flashinfer/gemm/group_gemm.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:47.3833911Z #47 692.2 copying ./include/flashinfer/gemm/group_gemm_fp8_groupwise_sm100.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:47.3835137Z #47 692.2 copying ./include/flashinfer/gemm/group_gemm_lora.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:47.3836418Z #47 692.2 copying ./include/flashinfer/gemm/group_gemm_mxfp4_groupwise_sm100.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:47.3837673Z #47 692.2 copying ./include/flashinfer/gemm/group_gemm_sm90.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:47.3838810Z #47 692.2 copying ./include/flashinfer/gemm/group_gemv.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:47.3839880Z #47 692.2 copying ./include/flashinfer/layout.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3840893Z #47 692.2 copying ./include/flashinfer/logging.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3841874Z #47 692.2 copying ./include/flashinfer/math.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3842851Z #47 692.2 copying ./include/flashinfer/mma.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3843818Z #47 692.2 copying ./include/flashinfer/norm.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3844801Z #47 692.2 copying ./include/flashinfer/page.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3845834Z #47 692.2 copying ./include/flashinfer/permuted_smem.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3846866Z #47 692.2 copying ./include/flashinfer/pos_enc.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3847884Z #47 692.2 copying ./include/flashinfer/profiler.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3848929Z #47 692.2 copying ./include/flashinfer/quantization.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3849988Z #47 692.2 copying ./include/flashinfer/sampling.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.3850961Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm 2025-09-07T09:27:47.3852148Z #47 692.2 copying ./include/flashinfer/trtllm/batched_gemm/KernelRunner.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm 2025-09-07T09:27:47.3853762Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:47.3855455Z #47 692.2 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/BatchedGemmEnums.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:47.3857568Z #47 692.2 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/BatchedGemmInterface.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:47.3859702Z #47 692.2 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/BatchedGemmOptions.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:47.3861717Z #47 692.2 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/Enums.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:47.3863735Z #47 692.2 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/GemmGatedActOptions.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:47.4225258Z #47 692.2 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/GemmOptions.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:47.4227440Z #47 692.2 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/KernelParams.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:47.4229472Z #47 692.2 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/KernelParamsDecl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:47.4231443Z #47 692.2 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/KernelTraits.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:47.4233424Z #47 692.2 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/TmaDescriptor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:47.4235059Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T09:27:47.4236796Z #47 692.2 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/CommonUtils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T09:27:47.4239281Z #47 692.2 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/CudaKernelLauncher.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T09:27:47.4241443Z #47 692.2 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/DtypeDecl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T09:27:47.4244371Z #47 692.2 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/MmaDecl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T09:27:47.4246557Z #47 692.2 copying ./include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/SfLayoutDecl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T09:27:47.4248254Z #47 692.2 copying ./include/flashinfer/trtllm/common.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm 2025-09-07T09:27:47.4249367Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T09:27:47.4250636Z #47 692.2 copying ./include/flashinfer/trtllm/common/cudaBf16Fallbacks.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T09:27:47.4252130Z #47 692.2 copying ./include/flashinfer/trtllm/common/cudaBf16Wrapper.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T09:27:47.4253858Z #47 692.2 copying ./include/flashinfer/trtllm/common/cudaFp8Utils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T09:27:47.4255455Z #47 692.2 copying ./include/flashinfer/trtllm/common/cudaTypeUtils.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T09:27:47.4256914Z #47 692.2 copying ./include/flashinfer/trtllm/common/cudaUtils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T09:27:47.4258116Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/cubin 2025-09-07T09:27:47.4259403Z #47 692.2 copying ./include/flashinfer/trtllm/fmha/cubin/kernelMetaInfo.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/cubin 2025-09-07T09:27:47.4260917Z #47 692.2 copying ./include/flashinfer/trtllm/fmha/decoder_impl_common.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T09:27:47.4262392Z #47 692.2 copying ./include/flashinfer/trtllm/fmha/decoder_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T09:27:47.4264091Z #47 692.2 copying ./include/flashinfer/trtllm/fmha/fmhaKernels.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T09:27:47.4265965Z #47 692.2 copying ./include/flashinfer/trtllm/fmha/fmhaRunner.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T09:27:47.4267381Z #47 692.2 copying ./include/flashinfer/trtllm/fmha/fmhaRunnerParams.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T09:27:47.4268790Z #47 692.2 copying ./include/flashinfer/trtllm/fmha/kernelParams.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T09:27:47.4270107Z #47 692.2 copying ./include/flashinfer/trtllm/fmha/lse.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T09:27:47.4271200Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T09:27:47.4272356Z #47 692.2 copying ./include/flashinfer/trtllm/fused_moe/DevKernel.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T09:27:47.4273779Z #47 692.2 copying ./include/flashinfer/trtllm/fused_moe/IntFastDiv.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T09:27:47.4275241Z #47 692.2 copying ./include/flashinfer/trtllm/fused_moe/RoutingKernel.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T09:27:47.4276702Z #47 692.2 copying ./include/flashinfer/trtllm/fused_moe/RoutingKernel.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T09:27:47.4278210Z #47 692.2 copying ./include/flashinfer/trtllm/fused_moe/RoutingKernelTopK.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T09:27:47.4279668Z #47 692.2 copying ./include/flashinfer/trtllm/fused_moe/runner.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T09:27:47.4280902Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T09:27:47.4282395Z #47 692.2 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/Enums.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T09:27:47.4284239Z #47 692.2 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/GemmInterface.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T09:27:47.4286077Z #47 692.2 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/GemmOptions.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T09:27:47.4287962Z #47 692.2 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/KernelParams.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T09:27:47.4289796Z #47 692.2 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/KernelTraits.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T09:27:47.4291627Z #47 692.2 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/TmaDescriptor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T09:27:47.4293674Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T09:27:47.4295443Z #47 692.2 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/CommonUtils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T09:27:47.4297636Z #47 692.2 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/CudaKernelLauncher.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T09:27:47.4299801Z #47 692.2 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/DtypeDecl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T09:27:47.4301895Z #47 692.2 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/MmaDecl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T09:27:47.4303989Z #47 692.2 copying ./include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/SfLayoutDecl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T09:27:47.4305654Z #47 692.2 copying ./include/flashinfer/utils.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.4306667Z #47 692.2 copying ./include/flashinfer/vec_dtypes.cuh -> build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer 2025-09-07T09:27:47.4307512Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T09:27:47.4308280Z #47 692.2 copying ./tvm_binding/batch_decode.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T09:27:47.4309259Z #47 692.2 copying ./tvm_binding/batch_decode_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T09:27:47.4310303Z #47 692.2 copying ./tvm_binding/batch_decode_jit_tvm_binding.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T09:27:47.4311275Z #47 692.2 copying ./tvm_binding/batch_mla_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T09:27:47.4312246Z #47 692.2 copying ./tvm_binding/batch_mla_jit_tvm_binding.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T09:27:47.4313181Z #47 692.2 copying ./tvm_binding/batch_mla_plan.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T09:27:47.4314104Z #47 692.2 copying ./tvm_binding/batch_mla_run.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T09:27:47.4315034Z #47 692.2 copying ./tvm_binding/batch_prefill.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T09:27:47.4316026Z #47 692.2 copying ./tvm_binding/batch_prefill_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T09:27:47.4317068Z #47 692.2 copying ./tvm_binding/batch_prefill_jit_tvm_binding.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T09:27:47.4318050Z #47 692.2 copying ./tvm_binding/batch_prefill_sm90.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T09:27:47.4319115Z #47 692.2 copying ./tvm_binding/batch_prefill_sm90_customize_config.jinja -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T09:27:47.4320215Z #47 692.2 copying ./tvm_binding/batch_prefill_sm90_jit_tvm_binding.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T09:27:47.4321181Z #47 692.2 copying ./tvm_binding/sampling.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T09:27:47.4322102Z #47 692.2 copying ./tvm_binding/sampling_jit_tvm_binding.cu -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T09:27:47.4323056Z #47 692.2 copying ./tvm_binding/tvm_binding_utils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding 2025-09-07T09:27:47.4323865Z #47 692.2 copying ./version.txt -> build/lib.linux-x86_64-cpython-312/flashinfer/data 2025-09-07T09:27:47.4324528Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/logging 2025-09-07T09:27:47.4325394Z #47 692.2 copying build/aot-ops-package-dir/logging/logging.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/logging 2025-09-07T09:27:47.4326833Z #47 692.2 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.4330120Z #47 692.2 copying build/aot-ops-package-dir/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.4333577Z #47 692.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.4337004Z #47 692.3 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.4340342Z #47 692.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.4343376Z #47 692.3 copying build/aot-ops-package-dir/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.4346542Z #47 692.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.4349603Z #47 692.3 copying build/aot-ops-package-dir/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.4352624Z #47 692.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.4355742Z #47 692.3 copying build/aot-ops-package-dir/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.4358835Z #47 692.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.4363413Z #47 692.3 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.4366673Z #47 692.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.4369648Z #47 692.3 copying build/aot-ops-package-dir/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.4372742Z #47 692.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.4376152Z #47 692.3 copying build/aot-ops-package-dir/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.4379540Z #47 692.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.4382897Z #47 692.3 copying build/aot-ops-package-dir/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.4386332Z #47 692.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.4389471Z #47 692.3 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.4392879Z #47 692.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.5327359Z #47 692.3 copying build/aot-ops-package-dir/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.5330518Z #47 692.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.5334071Z #47 692.3 copying build/aot-ops-package-dir/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.5337370Z #47 692.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.5340883Z #47 692.3 copying build/aot-ops-package-dir/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.5344475Z #47 692.3 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.5347890Z #47 692.3 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.5351153Z #47 692.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.5354057Z #47 692.4 copying build/aot-ops-package-dir/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.5357022Z #47 692.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.5360103Z #47 692.4 copying build/aot-ops-package-dir/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.5363186Z #47 692.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.5367266Z #47 692.4 copying build/aot-ops-package-dir/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.5370699Z #47 692.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.5374486Z #47 692.4 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.5377849Z #47 692.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.5380974Z #47 692.4 copying build/aot-ops-package-dir/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.5384080Z #47 692.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.5387421Z #47 692.4 copying build/aot-ops-package-dir/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.5390311Z #47 692.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.5393951Z #47 692.4 copying build/aot-ops-package-dir/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.5397330Z #47 692.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.5400840Z #47 692.4 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.5404453Z #47 692.4 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.6334031Z #47 692.4 copying build/aot-ops-package-dir/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.6337383Z #47 692.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.6340783Z #47 692.5 copying build/aot-ops-package-dir/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.6344201Z #47 692.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.6347462Z #47 692.5 copying build/aot-ops-package-dir/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.6350762Z #47 692.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.6354185Z #47 692.5 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.6357540Z #47 692.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.6360593Z #47 692.5 copying build/aot-ops-package-dir/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.6363810Z #47 692.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.6367069Z #47 692.5 copying build/aot-ops-package-dir/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.6370289Z #47 692.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.6373896Z #47 692.5 copying build/aot-ops-package-dir/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.6377300Z #47 692.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.6380852Z #47 692.5 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:47.6384333Z #47 692.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.6387527Z #47 692.5 copying build/aot-ops-package-dir/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.6390595Z #47 692.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.6394259Z #47 692.5 copying build/aot-ops-package-dir/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:47.6397721Z #47 692.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:47.6401283Z #47 692.5 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:47.6404978Z #47 692.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:47.6408492Z #47 692.5 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:47.6411943Z #47 692.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:47.6415691Z #47 692.5 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:47.6419234Z #47 692.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:47.6422864Z #47 692.5 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:47.6426658Z #47 692.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:47.7366432Z #47 692.5 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:47.7370177Z #47 692.5 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:47.7374100Z #47 692.5 copying build/aot-ops-package-dir/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:47.7377614Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ 2025-09-07T09:27:47.7380672Z #47 692.6 copying build/aot-ops-package-dir/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ 2025-09-07T09:27:47.7383701Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ 2025-09-07T09:27:47.7386856Z #47 692.6 copying build/aot-ops-package-dir/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ 2025-09-07T09:27:47.7389710Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 2025-09-07T09:27:47.7393018Z #47 692.6 copying build/aot-ops-package-dir/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 2025-09-07T09:27:47.7396288Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 2025-09-07T09:27:47.7399479Z #47 692.6 copying build/aot-ops-package-dir/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 2025-09-07T09:27:47.7402612Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ 2025-09-07T09:27:47.7405926Z #47 692.6 copying build/aot-ops-package-dir/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ 2025-09-07T09:27:47.7408679Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ 2025-09-07T09:27:47.7411454Z #47 692.6 copying build/aot-ops-package-dir/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ 2025-09-07T09:27:47.7414698Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 2025-09-07T09:27:47.7417885Z #47 692.6 copying build/aot-ops-package-dir/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 2025-09-07T09:27:47.7421075Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 2025-09-07T09:27:47.7424284Z #47 692.6 copying build/aot-ops-package-dir/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 2025-09-07T09:27:47.7426903Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/fmha_cutlass_sm100a 2025-09-07T09:27:47.7427913Z #47 692.6 copying build/aot-ops-package-dir/fmha_cutlass_sm100a/fmha_cutlass_sm100a.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/fmha_cutlass_sm100a 2025-09-07T09:27:47.7429328Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False 2025-09-07T09:27:47.7431770Z #47 692.6 copying build/aot-ops-package-dir/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False 2025-09-07T09:27:47.7434480Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 2025-09-07T09:27:47.7437141Z #47 692.6 copying build/aot-ops-package-dir/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 2025-09-07T09:27:47.7439790Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False 2025-09-07T09:27:47.7442470Z #47 692.6 copying build/aot-ops-package-dir/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False 2025-09-07T09:27:47.7445156Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 2025-09-07T09:27:47.7447879Z #47 692.6 copying build/aot-ops-package-dir/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 2025-09-07T09:27:47.7449963Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/mla 2025-09-07T09:27:47.7450708Z #47 692.6 copying build/aot-ops-package-dir/mla/mla.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/mla 2025-09-07T09:27:47.7451469Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/cascade 2025-09-07T09:27:47.7452295Z #47 692.6 copying build/aot-ops-package-dir/cascade/cascade.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/cascade 2025-09-07T09:27:47.7453406Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/norm 2025-09-07T09:27:47.7454307Z #47 692.6 copying build/aot-ops-package-dir/norm/norm.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/norm 2025-09-07T09:27:47.7455155Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/page 2025-09-07T09:27:47.7456024Z #47 692.6 copying build/aot-ops-package-dir/page/page.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/page 2025-09-07T09:27:47.7456915Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/quantization 2025-09-07T09:27:47.7457967Z #47 692.6 copying build/aot-ops-package-dir/quantization/quantization.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/quantization 2025-09-07T09:27:47.7458969Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/rope 2025-09-07T09:27:47.7459817Z #47 692.6 copying build/aot-ops-package-dir/rope/rope.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/rope 2025-09-07T09:27:47.7460699Z #47 692.6 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/sampling 2025-09-07T09:27:47.8366028Z #47 692.6 copying build/aot-ops-package-dir/sampling/sampling.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/sampling 2025-09-07T09:27:47.8367008Z #47 692.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/trtllm_utils 2025-09-07T09:27:47.8368386Z #47 692.7 copying build/aot-ops-package-dir/trtllm_utils/trtllm_utils.so -> build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/trtllm_utils 2025-09-07T09:27:47.8369514Z #47 692.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:47.8370682Z #47 692.7 copying 3rdparty/cutlass/include/cute/algorithm/axpby.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:47.8372141Z #47 692.7 copying 3rdparty/cutlass/include/cute/algorithm/clear.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:47.8373801Z #47 692.7 copying 3rdparty/cutlass/include/cute/algorithm/cooperative_copy.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:47.8375374Z #47 692.7 copying 3rdparty/cutlass/include/cute/algorithm/cooperative_gemm.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:47.8376887Z #47 692.7 copying 3rdparty/cutlass/include/cute/algorithm/copy.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:47.8378319Z #47 692.7 copying 3rdparty/cutlass/include/cute/algorithm/fill.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:47.8379784Z #47 692.7 copying 3rdparty/cutlass/include/cute/algorithm/functional.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:47.8381263Z #47 692.7 copying 3rdparty/cutlass/include/cute/algorithm/gemm.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:47.8382705Z #47 692.7 copying 3rdparty/cutlass/include/cute/algorithm/prefer.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:47.8384309Z #47 692.7 copying 3rdparty/cutlass/include/cute/algorithm/prefetch.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:47.8385800Z #47 692.7 copying 3rdparty/cutlass/include/cute/algorithm/tensor_algorithms.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:47.8387296Z #47 692.7 copying 3rdparty/cutlass/include/cute/algorithm/tensor_reduce.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:47.8388801Z #47 692.7 copying 3rdparty/cutlass/include/cute/algorithm/tuple_algorithms.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:47.8390111Z #47 692.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8391174Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/cluster_sm100.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8392880Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/cluster_sm90.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8394232Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/config.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8395617Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/copy.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8397024Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/copy_sm100.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8398383Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/copy_sm100_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8399739Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/copy_sm50.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8401130Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/copy_sm75.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8402458Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/copy_sm80.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8403788Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/copy_sm90.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8405239Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/copy_sm90_desc.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8406564Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/copy_sm90_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8407855Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/mma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8409112Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm100.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8410427Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm100_desc.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8411755Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm100_umma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8413336Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm120.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8414717Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm120_sparse.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8416087Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm61.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8417399Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm70.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8418726Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm75.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8420040Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm80.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8421417Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm89.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8422786Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm90.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8424118Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm90_desc.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8425581Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm90_gmma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8426956Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm90_gmma_ext.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8428313Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm90_gmma_sparse.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8429715Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/mma_sm90_gmma_sparse_ext.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8431074Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/simd_sm100.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8432446Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/tmem_allocator_sm100.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8433788Z #47 692.7 copying 3rdparty/cutlass/include/cute/arch/util.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:47.8434802Z #47 692.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8435847Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/copy_atom.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8437166Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8438497Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm100.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8439898Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm100_im2col.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8441314Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm100_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8442676Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm50.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8444035Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm75.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8445382Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm80.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8446741Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm90.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8448127Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm90_im2col.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8449521Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm90_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8450938Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/copy_traits_sm90_tma_swizzle.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8452373Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/mma_atom.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8453927Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8455309Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm100.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8456741Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm120.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8458163Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm120_sparse.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8459595Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm61.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8460977Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm70.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8462362Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm75.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8463781Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm80.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8465266Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm89.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8466610Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm90.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8467975Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm90_gmma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8469369Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm90_gmma_ext.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8470802Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm90_gmma_sparse.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8472273Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/mma_traits_sm90_gmma_sparse_ext.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8473690Z #47 692.7 copying 3rdparty/cutlass/include/cute/atom/partitioner.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:47.8474965Z #47 692.7 copying 3rdparty/cutlass/include/cute/config.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T09:27:47.8475975Z #47 692.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container 2025-09-07T09:27:47.8477134Z #47 692.7 copying 3rdparty/cutlass/include/cute/container/alignment.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container 2025-09-07T09:27:47.8478570Z #47 692.7 copying 3rdparty/cutlass/include/cute/container/array.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container 2025-09-07T09:27:47.8479997Z #47 692.7 copying 3rdparty/cutlass/include/cute/container/array_aligned.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container 2025-09-07T09:27:47.8481467Z #47 692.7 copying 3rdparty/cutlass/include/cute/container/array_subbyte.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container 2025-09-07T09:27:47.8482973Z #47 692.7 copying 3rdparty/cutlass/include/cute/container/bit_field.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container 2025-09-07T09:27:47.8484432Z #47 692.7 copying 3rdparty/cutlass/include/cute/container/cuda_types.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container 2025-09-07T09:27:47.8485853Z #47 692.7 copying 3rdparty/cutlass/include/cute/container/tuple.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container 2025-09-07T09:27:47.8487270Z #47 692.7 copying 3rdparty/cutlass/include/cute/container/type_list.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container 2025-09-07T09:27:47.8488605Z #47 692.7 copying 3rdparty/cutlass/include/cute/int_tuple.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T09:27:47.8489803Z #47 692.7 copying 3rdparty/cutlass/include/cute/layout.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T09:27:47.8491020Z #47 692.7 copying 3rdparty/cutlass/include/cute/layout_composed.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T09:27:47.8492197Z #47 692.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:47.8493621Z #47 692.7 copying 3rdparty/cutlass/include/cute/numeric/arithmetic_tuple.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:47.8495152Z #47 692.7 copying 3rdparty/cutlass/include/cute/numeric/complex.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:47.8496554Z #47 692.7 copying 3rdparty/cutlass/include/cute/numeric/int.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:47.8498000Z #47 692.7 copying 3rdparty/cutlass/include/cute/numeric/integer_sequence.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:47.8499513Z #47 692.7 copying 3rdparty/cutlass/include/cute/numeric/integral_constant.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:47.8501025Z #47 692.7 copying 3rdparty/cutlass/include/cute/numeric/integral_ratio.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:47.8502456Z #47 692.7 copying 3rdparty/cutlass/include/cute/numeric/math.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:47.8503897Z #47 692.7 copying 3rdparty/cutlass/include/cute/numeric/numeric_types.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:47.8505423Z #47 692.7 copying 3rdparty/cutlass/include/cute/numeric/real.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:47.8506685Z #47 692.7 copying 3rdparty/cutlass/include/cute/pointer.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T09:27:47.8507908Z #47 692.7 copying 3rdparty/cutlass/include/cute/pointer_base.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T09:27:47.8509160Z #47 692.7 copying 3rdparty/cutlass/include/cute/pointer_flagged.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T09:27:47.8510415Z #47 692.7 copying 3rdparty/cutlass/include/cute/pointer_sparse.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T09:27:47.8511679Z #47 692.7 copying 3rdparty/cutlass/include/cute/pointer_swizzle.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T09:27:47.8512899Z #47 692.7 copying 3rdparty/cutlass/include/cute/stride.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T09:27:47.8514090Z #47 692.7 copying 3rdparty/cutlass/include/cute/swizzle.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T09:27:47.8515368Z #47 692.7 copying 3rdparty/cutlass/include/cute/swizzle_layout.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T09:27:47.8516615Z #47 692.7 copying 3rdparty/cutlass/include/cute/tensor.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T09:27:47.8517819Z #47 692.7 copying 3rdparty/cutlass/include/cute/tensor_impl.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T09:27:47.8519043Z #47 692.7 copying 3rdparty/cutlass/include/cute/tensor_zip.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T09:27:47.8520293Z #47 692.7 copying 3rdparty/cutlass/include/cute/underscore.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute 2025-09-07T09:27:47.8521311Z #47 692.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util 2025-09-07T09:27:47.8522339Z #47 692.7 copying 3rdparty/cutlass/include/cute/util/debug.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util 2025-09-07T09:27:47.8523621Z #47 692.7 copying 3rdparty/cutlass/include/cute/util/print.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util 2025-09-07T09:27:47.8524921Z #47 692.7 copying 3rdparty/cutlass/include/cute/util/print_latex.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util 2025-09-07T09:27:47.8526251Z #47 692.7 copying 3rdparty/cutlass/include/cute/util/print_svg.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util 2025-09-07T09:27:47.8527579Z #47 692.7 copying 3rdparty/cutlass/include/cute/util/print_tensor.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util 2025-09-07T09:27:47.8528911Z #47 692.7 copying 3rdparty/cutlass/include/cute/util/type_traits.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util 2025-09-07T09:27:47.8529943Z #47 692.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.8530979Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/aligned_buffer.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.8532021Z #47 692.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8533339Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/arch.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8534723Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/barrier.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8536150Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/cache_operation.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8537593Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/config.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8539065Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/grid_dependency_control.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8540526Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/memory.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8541942Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/memory_sm75.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8543359Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/memory_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8544862Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8546229Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/mma_sm100.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8547594Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/mma_sm50.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8548931Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/mma_sm60.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8550279Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/mma_sm61.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8551630Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/mma_sm70.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8552962Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/mma_sm75.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8554288Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/mma_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8555615Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/mma_sm89.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8556978Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/mma_sm90.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8558338Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/mma_sparse_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8559738Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/mma_sparse_sm89.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8561131Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/reg_reconfig.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8562472Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/simd.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8563801Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/simd_sm60.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8565156Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/simd_sm61.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8566516Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/synclog.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8567859Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/wmma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8569182Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/wmma_sm70.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8570531Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/wmma_sm72.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8571874Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/arch/wmma_sm75.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:47.8573393Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/array.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.8574710Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/array_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.8576062Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/array_subbyte.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.8577390Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/barrier.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.8578703Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/bfloat16.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.8579953Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/blas3.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.8581213Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/blas3_types.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.8582553Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/block_striped.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.8583882Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/cluster_launch.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.8585291Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.8586536Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/constants.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.8587684Z #47 692.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T09:27:47.8589228Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/collective/builders/sm100_common.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T09:27:47.8591161Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/collective/builders/sm100_umma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T09:27:47.8593399Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/collective/builders/sm90_common.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T09:27:47.8595374Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/collective/builders/sm90_gmma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T09:27:47.8597288Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/collective/collective_builder.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T09:27:47.8599091Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/collective/collective_conv.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T09:27:47.8600830Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/collective/detail.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T09:27:47.8602674Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/collective/sm100_implicit_gemm_umma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T09:27:47.8604800Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/collective/sm90_implicit_gemm_gmma_ss_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T09:27:47.8606516Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/conv2d_problem_size.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T09:27:47.8607969Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/conv3d_problem_size.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T09:27:47.8609446Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/convnd_problem_shape.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T09:27:47.8611220Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/convolution.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T09:27:47.8612746Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/detail.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T09:27:47.8614088Z #47 692.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T09:27:47.8615429Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/device/conv_universal_adapter.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T09:27:47.8617171Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/device/direct_convolution.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T09:27:47.8618889Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/device/implicit_gemm_convolution.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T09:27:47.8620673Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/device/implicit_gemm_convolution_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T09:27:47.8622325Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/dispatch_policy.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T09:27:47.8623590Z #47 692.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8624886Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/conv_universal.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8626604Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8628217Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_dgrad.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8629830Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8631498Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8633223Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8634967Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8636741Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_with_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8638464Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_group_fprop.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8640113Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_wgrad.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8641778Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_wgrad_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8643439Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv3d_dgrad.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8645114Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv3d_fprop.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8646788Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv3d_fprop_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8648528Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv3d_fprop_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8650243Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_conv3d_wgrad.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8651863Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_deconv2d.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8653785Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_deconv2d_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8655494Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_deconv3d.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8657247Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_deconv3d_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8659002Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/default_depthwise_fprop.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8660693Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/direct_convolution.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8662414Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8664193Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8666105Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_strided_dgrad.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8667905Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8669741Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_with_fused_epilogue.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8671585Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/sm100_implicit_gemm_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8673419Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/kernel/sm90_implicit_gemm_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:47.8674774Z #47 692.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/thread 2025-09-07T09:27:47.8676020Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/thread/depthwise_mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/thread 2025-09-07T09:27:47.8677320Z #47 692.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8678839Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_filter_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8680869Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_filter_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8682967Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_output_gradient_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8685075Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_output_gradient_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8687180Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8689289Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_few_channels.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8691377Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_fixed_channels.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8693901Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8696028Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8698137Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_few_channels.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8700256Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_fixed_channels.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8702375Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8704424Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8706145Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_tile_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8708058Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_activation_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8710141Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_activation_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8712287Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_output_gradient_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8714438Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_output_gradient_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8716515Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_filter_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8718577Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_filter_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8720645Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_output_gradient_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8722763Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_output_gradient_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8724876Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_activation_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8726953Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_activation_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8729005Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_filter_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8731033Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_filter_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8733152Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8735102Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_activation_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8737232Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_activation_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8739387Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_output_gradient_tile_access_iterator_analytic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8741561Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_output_gradient_tile_access_iterator_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8743591Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/depthwise_direct_conv_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8745822Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_activation_tile_access_iterator_direct_conv_fixed_stride_dilation.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8748156Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_activation_tile_access_iterator_direct_conv_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8750275Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_direct_conv_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8752356Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_filter_tile_access_iterator_direct_conv_optimized.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8754370Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_pipelined.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8756157Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/depthwise_mma_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8757983Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/depthwise_mma_core_with_lane_access_size.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8759948Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/implicit_gemm_fprop_fusion_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8761821Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/implicit_gemm_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8763614Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/implicit_gemm_pipelined.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8765485Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/implicit_gemm_wgrad_fusion_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8767459Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/predicated_scale_bias_vector_access_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8769412Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/predicated_scale_bias_vector_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8771272Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/threadblock/threadblock_swizzle.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:47.8772678Z #47 692.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/warp 2025-09-07T09:27:47.8774139Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/warp/mma_depthwise_simt.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/warp 2025-09-07T09:27:47.8775805Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/warp/mma_depthwise_simt_tile_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/warp 2025-09-07T09:27:47.8777497Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/conv/warp/scale_bias_relu_transform.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/warp 2025-09-07T09:27:47.8778957Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/coord.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.8780206Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/core_io.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.8781550Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/cuda_host_adapter.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.8782904Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/cutlass.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.8783977Z #47 692.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:47.8785324Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/detail/blockwise_scale_layout.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:47.8786863Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/detail/cluster.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:47.8788311Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/detail/collective.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:47.8789542Z #47 692.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/collective 2025-09-07T09:27:47.8790923Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/detail/collective/mixed_input_utils.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/collective 2025-09-07T09:27:47.8792936Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/detail/dependent_false.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:47.8794490Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/detail/helper_macros.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:47.8795991Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/detail/layout.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:47.8797584Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/detail/mainloop_fusion_helper_scale_factor.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:47.8799173Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/detail/mma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:47.8800703Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/detail/sm100_blockscaled_layout.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:47.8802284Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/detail/sm100_tmem_helper.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:47.8803732Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/device_kernel.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.8805067Z #47 692.7 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T09:27:47.8806648Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/builders/sm100_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T09:27:47.8808661Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/builders/sm120_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T09:27:47.8810657Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/builders/sm120_common.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T09:27:47.8812733Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/builders/sm90_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T09:27:47.8830865Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/builders/sm90_common.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T09:27:47.8832880Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/collective_builder.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:47.8834760Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/collective_epilogue.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:47.8836677Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/default_epilogue.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:47.8838545Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/default_epilogue_array.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:47.8840363Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/detail.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:47.8842208Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/epilogue_tensor_broadcast.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:47.8844176Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_array_nosmem.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:47.8846159Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_array_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:47.8848131Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_nosmem.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:47.8850074Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:47.8852026Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/sm70_epilogue_vectorized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:47.8854322Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/sm70_epilogue_vectorized_array.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:47.8856382Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/sm90_epilogue_array_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:47.8858470Z #47 692.7 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/sm90_epilogue_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:47.9367285Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/collective/sm90_epilogue_tma_warpspecialized_bias_elementwise.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:47.9369255Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/dispatch_policy.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue 2025-09-07T09:27:47.9370541Z #47 692.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:47.9371875Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/callbacks.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:47.9374115Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/operations.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:47.9375987Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm100_callbacks_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:47.9378057Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm100_visitor_compute_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:47.9380066Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm100_visitor_store_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:47.9382064Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm120_callbacks_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:47.9384056Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm120_visitor_store_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:47.9386144Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm90_callbacks_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:47.9388080Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_compute_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:47.9390015Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_load_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:47.9392082Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_store_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:47.9394215Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:47.9396119Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_topk_softmax.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:47.9397533Z #47 692.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9398900Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/activation.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9400644Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/conversion_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9402349Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/detail.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9404096Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9406032Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_bias_elementwise.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9407935Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_bias_relu.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9409790Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_clamp.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9411587Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_dgelu.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9413673Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_drelu.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9415510Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_gelu.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9417362Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_generic.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9419274Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_generic_with_scaling.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9421257Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_hardswish.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9423139Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_leaky_relu.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9425098Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9426941Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9428764Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_relu.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9430529Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_relu0.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9432356Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_residual_block.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9434196Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_sigmoid.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9435971Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_silu.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9437819Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_tensor_broadcast.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9439726Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/linear_combination_with_elementwise.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9441502Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/reduction_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9443196Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/thread/scale_type.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:47.9444549Z #47 692.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9446067Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_complex_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9448138Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_complex_tensor_op_blas3.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9450124Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_direct_store.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9452098Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9454330Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9456316Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9458335Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensor_op_blas3.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9460377Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_volta_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9462392Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9464419Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9466525Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_with_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9468482Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_wmma_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9470407Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_simt.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9472313Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9474261Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_volta_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9476208Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_wmma_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9478206Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/direct_store_epilogue_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9480111Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9481887Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9483763Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_base_streamk.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9485642Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_depthwise.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9487511Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_direct_store.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9489413Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_gemm_k_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9491343Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9493685Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_smem_accumulator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9495714Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_streamk_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9497750Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_visitor_with_softmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9499716Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9501664Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9503623Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9505660Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9507591Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_visitor_callbacks.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9509518Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/epilogue_workspace.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9510993Z #47 692.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T09:27:47.9512530Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_2x.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T09:27:47.9514604Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_compute.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T09:27:47.9516611Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_load.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T09:27:47.9518637Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_store.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T09:27:47.9520612Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/fusion/visitors.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T09:27:47.9522546Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/interleaved_epilogue.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9524462Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/output_iterator_parameter.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9526387Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/output_tile_thread_map.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9528302Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9530261Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_affine.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9532303Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_affine_layout_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9534633Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_blas3.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9536672Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_conv.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9538717Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_direct_conv.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9540802Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9542880Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_predicates.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9545105Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_strided_dgrad.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9547055Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9548987Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator_mixed.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9550975Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator_pitch_linear.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:47.9552419Z #47 692.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:47.9553845Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/fragment_iterator_complex_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:47.9555712Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/fragment_iterator_gaussian_complex_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:47.9557533Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/fragment_iterator_simt.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:47.9559274Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/fragment_iterator_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:47.9561075Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/fragment_iterator_volta_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:47.9562870Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/fragment_iterator_wmma_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:47.9564566Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/simt_policy.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:47.9566182Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/tensor_op_policy.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:47.9567831Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/tile_iterator_simt.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:47.9569508Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/tile_iterator_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:47.9571217Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/tile_iterator_tensor_op_mixed.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:47.9573209Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/tile_iterator_volta_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:47.9575014Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/tile_iterator_wmma_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:47.9576769Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/volta_tensor_op_policy.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:47.9578509Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/epilogue/warp/wmma_tensor_op_policy.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:47.9580021Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/exmy_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.9581242Z #47 692.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/device 2025-09-07T09:27:47.9582932Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/experimental/distributed/device/detail.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/device 2025-09-07T09:27:47.9585283Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/experimental/distributed/device/dist_gemm_universal_wrapper.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/device 2025-09-07T09:27:47.9587418Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/experimental/distributed/device/full_barrier.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/device 2025-09-07T09:27:47.9589066Z #47 692.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel 2025-09-07T09:27:47.9590672Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/experimental/distributed/kernel/detail.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel 2025-09-07T09:27:47.9593082Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/experimental/distributed/kernel/dist_gemm_kernel_wrapper.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel 2025-09-07T09:27:47.9595349Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/experimental/distributed/kernel/full_barrier.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel 2025-09-07T09:27:47.9597029Z #47 692.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules 2025-09-07T09:27:47.9598775Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/experimental/distributed/schedules/dist_gemm_1d_schedules.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules 2025-09-07T09:27:47.9601105Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/experimental/distributed/schedules/dist_gemm_base_schedule.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules 2025-09-07T09:27:47.9602899Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/fast_math.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.9604158Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/float8.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.9605518Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/float_subbyte.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.9606823Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/floating_point_nvrtc.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.9608140Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/functional.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:47.9609315Z #47 692.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9610857Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm100_9xBF16_umma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9613180Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm100_blockscaled_sparse_umma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9615340Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm100_blockscaled_umma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9617534Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm100_blockwise_umma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9619620Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm100_common.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9621634Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm100_pipeline_carveout.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9623724Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm100_simt_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9625846Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm100_sparse_umma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9627822Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm100_umma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9629852Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm120_blockscaled_mma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9631919Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm120_blockscaled_sparse_mma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9633966Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm120_blockwise_mma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9635918Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm120_common.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9637829Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm120_mma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9639783Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm120_sparse_mma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9641721Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm1xx_common.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9643639Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm1xx_sparse_config.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9645548Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm90_common.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9647450Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm90_gmma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9649380Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm90_sparse_config.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9651329Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/builders/sm90_sparse_gmma_builder.inl -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:47.9653512Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/collective_builder.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9655380Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/collective_builder_decl.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9657183Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/collective_mma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9659001Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/collective_mma_decl.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9660786Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/fp8_accumulation.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9662687Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9664843Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9666799Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_sparse_mma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9668702Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9670642Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized_blockwise_scaling.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9672623Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized_emulated.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9674489Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_mma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9676371Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_mma_warpspecialized_blockwise_scaling.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9678292Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_mma_warpspecialized_emulated.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9680164Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm100_sparse_mma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9682012Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm120_blockscaled_mma_array_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9683814Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm120_blockscaled_mma_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9685614Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm120_blockscaled_sparse_mma_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9687504Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm120_mma_array_tma_blockwise_scaling.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9689307Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm120_mma_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9691038Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm120_mma_tma_blockwise_scaling.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9693239Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm120_sparse_mma_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9695010Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm70_mma_twostage.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9696797Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm80_mma_array_multistage.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9698608Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm80_mma_multistage.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9700603Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_rs_warpspecialized_mixed_input.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9702643Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9704645Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized_fp8.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9706808Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized_fp8_blockwise_scaling.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9708860Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_multistage_gmma_rs_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9710825Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_multistage_gmma_ss_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9714997Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_rs_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9716993Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_rs_warpspecialized_mixed_input.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9718836Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9720616Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9722511Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized_fp8.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9724598Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized_fp8_blockwise_scaling.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9726605Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_sparse_mma_tma_gmma_ss_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9728588Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/collective/sm90_sparse_mma_tma_gmma_ss_warpspecialized_fp8.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:47.9730021Z #47 692.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9731238Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/base_grouped.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9733112Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/default_gemm_configuration.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9734752Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/ell_gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9736317Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9737861Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_array.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9739430Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_batched.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9741018Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9742602Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_grouped.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9744264Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_layernorm_mainloop_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9746019Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_sparse.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9747591Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_sparse_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9749261Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_sparse_universal_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9750994Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_sparse_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9752622Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_sparse_with_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9754257Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_splitk_parallel.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9755843Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9757501Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9759119Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9760788Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_streamk_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9762538Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9764222Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_universal_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9765871Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemm_with_k_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9767406Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/gemv.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9768904Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/rank_2k.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9770424Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/rank_2k_grouped.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9771936Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/rank_k.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9773682Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/symm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9775185Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/device/trmm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:47.9776695Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/dispatch_policy.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm 2025-09-07T09:27:47.9778134Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm 2025-09-07T09:27:47.9779547Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/gemm_enumerated_types.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm 2025-09-07T09:27:47.9781098Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/group_array_problem_shape.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm 2025-09-07T09:27:47.9782339Z #47 692.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9783623Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_ell_gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9785334Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9786906Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9788533Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_grouped.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9790294Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_grouped_per_group_scale.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9792258Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_grouped_softmax_mainloop_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9794311Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_layernorm_mainloop_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9796211Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_planar_complex_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9797960Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9799677Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9801526Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_universal_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9803336Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9805216Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_with_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9806925Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_splitk_parallel.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9808643Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_streamk_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9810338Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9812045Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_universal_with_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9814018Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9815737Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_with_broadcast.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9817478Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_with_k_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9819210Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemm_with_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9820874Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_gemv.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9822467Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_rank_2k.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9824199Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_rank_2k_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9825966Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_rank_2k_grouped.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9827607Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_rank_2k_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9829240Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_rank_k.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9830839Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_rank_k_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9832482Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_rank_k_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9834082Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_symm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9835715Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_symm_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9837349Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_symm_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9838939Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_trmm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9840523Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_trmm_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9842147Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/default_trmm_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9843715Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/ell_gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9845203Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9846684Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_array.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9848205Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_batched.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9849751Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_grouped.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9851358Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_grouped_per_group_scale.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9853287Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_grouped_problem_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9855059Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_grouped_softmax_mainloop_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9856891Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_layernorm_mainloop_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9858590Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9860178Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_pipelined.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9861846Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9863523Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_planar_complex_array.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9865305Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_sparse_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9866980Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_sparse_universal_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9868698Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_splitk_parallel.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9870354Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_streamk_with_fused_epilogue.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9872040Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_transpose_operands.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9873656Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9875215Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_universal.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9876807Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_universal_decl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9878424Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_universal_streamk.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9880074Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_universal_with_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9881796Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_universal_with_visitor_streamk.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9883453Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9885051Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_with_fused_epilogue.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9886684Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemm_with_k_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9888220Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemv.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9889797Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/gemv_batched_strided.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9891426Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/grouped_problem_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9893459Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/params_sparse_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9895198Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/params_universal_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9896853Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/rank_2k_grouped.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9898541Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/rank_2k_grouped_problem_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9900292Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/rank_2k_transpose_operands.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9902013Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/rank_2k_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9903639Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/rank_k_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9905480Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9907364Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized_input_transform.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9909294Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized_mma_transform.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9911137Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9912961Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized_input_transform.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9914845Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized_mma_transform.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9916689Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_sparse_gemm_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9918461Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_static_tile_scheduler.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9920127Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_tile_scheduler.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9921815Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_tile_scheduler_group.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9923604Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm100_tile_scheduler_stream_k.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9925454Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm120_gemm_tma_warpspecialized_cooperative_asymmetric_dma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9927233Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm70_gemm.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9928829Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm70_gemm_array.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9930565Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_gemm_array_tma_warpspecialized_cooperative.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9932536Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_gemm_array_tma_warpspecialized_pingpong.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9934516Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9936210Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9938067Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_cooperative.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9939969Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9941784Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9943607Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized_cooperative.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9945574Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized_pingpong.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9947275Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_tile_scheduler.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9948940Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_tile_scheduler_group.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9950632Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sm90_tile_scheduler_stream_k.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9952253Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sparse_gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9953846Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sparse_gemm_with_absmax.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9955490Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/sparse_gemm_with_visitor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9957206Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/static_tile_scheduler.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9958826Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/symm_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9960397Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/tile_scheduler.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9962071Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/tile_scheduler_detail.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9963720Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/tile_scheduler_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9965307Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/kernel/trmm_universal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:47.9966543Z #47 692.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T09:27:47.9967764Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/thread/mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T09:27:47.9969238Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/thread/mma_sm50.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T09:27:47.9970731Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/thread/mma_sm60.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T09:27:47.9972234Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/thread/mma_sm61.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T09:27:47.9973713Z #47 692.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:47.9975099Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_ell_mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:47.9976871Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_gemv_core.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:47.9978619Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:47.9980365Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_core.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:47.9982150Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_core_simt.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:47.9983962Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sm70.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:47.9985854Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sm75.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:47.9987596Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:47.9989411Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sparse_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:47.9991287Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_core_with_access_size.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:47.9993465Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_core_with_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:47.9995380Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_core_wmma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:47.9997274Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_layernorm_mainloop_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:47.9999247Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_planar_complex_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0001231Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_planar_complex_pipelined.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0003491Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_softmax_mainloop_fusion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0005484Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_mma_with_reduction.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0007330Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_multistage_mma_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0009215Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_multistage_mma_complex_core.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0011121Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_multistage_mma_complex_core_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0013273Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_multistage_trmm_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0015138Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_sparse_mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0016903Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/default_trmm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0018674Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/ell_mma_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0020458Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/ell_mma_pipelined.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0022169Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/gemv.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0023871Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/index_remat.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0025720Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0027440Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_blas3_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0029285Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_layernorm_mainloop_fusion_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0031143Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0032829Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_pipelined.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0034574Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_planar_complex_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0036375Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_planar_complex_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0038224Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_planar_complex_pipelined.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0039990Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_singlestage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0041811Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_softmax_mainloop_fusion_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0043613Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_sparse_base.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0045346Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_sparse_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0047151Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/mma_with_reduction_multistage.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0048949Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/threadblock_swizzle.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0050767Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/threadblock/threadblock_swizzle_streamk.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.0052135Z #47 692.8 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0053660Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/default_mma_complex_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0055363Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/default_mma_sparse_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0057022Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/default_mma_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0058698Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/default_mma_tensor_op_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0060443Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/default_mma_with_reduction_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0062143Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/default_mma_wmma_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0063879Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/layernorm_scale_bias_transform.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0065578Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0067063Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_complex_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0068671Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_complex_tensor_op_fast_f32.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0070375Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_complex_tensor_op_tile_iterator_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0072051Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_gaussian_complex_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0073793Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_gaussian_complex_tensor_op_tile_iterator_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0075504Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_mixed_input_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0077071Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0078580Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_simt.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0080072Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_simt_policy.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0081611Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_simt_tile_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0083185Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_sparse_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0084719Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0086233Z #47 692.8 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_fast_f32.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0420204Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_fragment_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0421952Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_policy.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0423695Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_sm70.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0425519Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_access_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0427192Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0428868Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sm70.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0430533Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0432201Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sparse.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0433862Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_wmma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0435495Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_op_wmma.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0437080Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/mma_with_reduction_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0438691Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/scale_bias_tile_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0440337Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/softmax_scale_bias_transform.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0442074Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/gemm/warp/tile_iterator_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.0443482Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/gemm_coord.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0444709Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/gemm_coord.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0445904Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/half.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0447100Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/integer_subbyte.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0448391Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/kernel_hardware_info.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0449702Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/kernel_hardware_info.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0450989Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/kernel_launch.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0452024Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.0453384Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/layout/layout.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.0454821Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/layout/matrix.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.0456341Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/layout/permute.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.0457800Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/layout/pitch_linear.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.0459272Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/layout/tensor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.0460849Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/layout/tensor_op_multiplicand_sm70.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.0462484Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/layout/tensor_op_multiplicand_sm75.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.0464115Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/layout/tensor_op_multiplicand_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.0465813Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/layout/vector.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.0467116Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/matrix.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0468328Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/matrix_coord.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0469548Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/matrix_shape.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0470815Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/numeric_conversion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0472085Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/numeric_size.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0473305Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/numeric_types.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0474355Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/pipeline 2025-09-07T09:27:48.0475504Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/pipeline/pipeline.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/pipeline 2025-09-07T09:27:48.0476985Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/pipeline/sm100_pipeline.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/pipeline 2025-09-07T09:27:48.0478477Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/pipeline/sm90_pipeline.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/pipeline 2025-09-07T09:27:48.0479848Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/pitch_linear_coord.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0480912Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/platform 2025-09-07T09:27:48.0482063Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/platform/platform.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/platform 2025-09-07T09:27:48.0483411Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/predicate_vector.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0484677Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/quaternion.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0485887Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/real.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0486970Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T09:27:48.0488445Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/reduction/device/reduce_split_k.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T09:27:48.0490137Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/reduction/device/tensor_reduce.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T09:27:48.0492161Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/reduction/device/tensor_reduce_affine_contiguous.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T09:27:48.0494315Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/reduction/device/tensor_reduce_affine_strided.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T09:27:48.0495747Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T09:27:48.0497158Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/reduction/kernel/reduce_softmax_final.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T09:27:48.0498991Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/reduction/kernel/reduce_split_k.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T09:27:48.0500818Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/reduction/kernel/tensor_reduce_affine_contiguous.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T09:27:48.0502728Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/reduction/kernel/tensor_reduce_affine_strided.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T09:27:48.0504170Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/thread 2025-09-07T09:27:48.0505558Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/reduction/thread/reduce.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/thread 2025-09-07T09:27:48.0507211Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/reduction/thread/reduction_operators.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/thread 2025-09-07T09:27:48.0508855Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/reduction/threadblock_swizzle.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction 2025-09-07T09:27:48.0510264Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/relatively_equal.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0511527Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/semaphore.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0512778Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/subbyte_reference.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0514045Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/tensor_coord.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0515271Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/tensor_ref.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0516539Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/tensor_ref_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0517825Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/tensor_view.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0519163Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/tensor_view_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0520475Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/tfloat32.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0521490Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/thread 2025-09-07T09:27:48.0522574Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/thread/matrix.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/thread 2025-09-07T09:27:48.0523864Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/trace.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0524943Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/collective 2025-09-07T09:27:48.0526349Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/collective/sm90_wgmma_transpose.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/collective 2025-09-07T09:27:48.0527787Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/device 2025-09-07T09:27:48.0529214Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/device/transform_universal_adapter.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/device 2025-09-07T09:27:48.0530591Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/kernel 2025-09-07T09:27:48.0531952Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/kernel/filter_format_transformer.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/kernel 2025-09-07T09:27:48.0534042Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/kernel/sm90_sparse_gemm_compressor.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/kernel 2025-09-07T09:27:48.0535946Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/kernel/sparse_gemm_compressor.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/kernel 2025-09-07T09:27:48.0537705Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/pitch_linear_thread_map.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform 2025-09-07T09:27:48.0539039Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/thread 2025-09-07T09:27:48.0540417Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/thread/transpose.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/thread 2025-09-07T09:27:48.0542126Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/thread/unary_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/thread 2025-09-07T09:27:48.0543513Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0545100Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/ell_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0547012Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/ell_predicated_tile_access_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0549005Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/ell_predicated_tile_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0551048Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_scale_bias_vector_access_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0553189Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_scale_bias_vector_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0555230Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_tile_access_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0557348Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_tile_access_iterator_2dthreadtile.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0559449Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_tile_access_iterator_params.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0561582Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_tile_access_iterator_triangular_matrix.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0563651Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_tile_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0565683Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_tile_iterator_2dthreadtile.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0567771Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_tile_iterator_triangular_matrix.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0569847Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/predicated_vector_access_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0571906Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_scale_bias_vector_access_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0574215Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0576309Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator_pitch_linear.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0578484Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator_pitch_linear_direct_conv.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0580721Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0582855Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator_tensor_op_sm80.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0585013Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0586961Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator_pitch_linear.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0589124Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator_pitch_linear_2dthreadtile.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0591184Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator_tensor_op.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0593583Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator_tensor_op_sm70.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0595580Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/threadblock/vector_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.0597024Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/warp 2025-09-07T09:27:48.0598416Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/transform/warp/vector_fragment_iterator.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/warp 2025-09-07T09:27:48.0599998Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/uint128.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0601263Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/version.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0602534Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/wmma_array.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0603827Z #47 692.9 copying 3rdparty/cutlass/include/cutlass/workspace.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.0605057Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0606358Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/GPU_Clock.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0608010Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/command_line.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0609695Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/cublas_wrappers.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0611356Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/debug.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0613253Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_dump.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0614979Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_groupnorm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0616742Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_layernorm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0618481Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_memory.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0620205Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_nchw_to_nhwc.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0622029Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_nhwc_padding.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0623852Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_nhwc_pooling.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0625686Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_nhwc_to_nchw.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0627403Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_rmsnorm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0629058Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/device_utils.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0630740Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/distribution.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0632426Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/exceptions.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0634140Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/gett_commandline.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0635842Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/helper_cuda.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0637508Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/host_reorder.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0639143Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/host_tensor.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0640852Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/host_tensor_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0642580Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/host_uncompress.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0644247Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/index_sequence.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0645952Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0647669Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/packed_stride.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0649342Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/print_error.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0650754Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/detail 2025-09-07T09:27:48.0652432Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/detail/inner_product.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/detail 2025-09-07T09:27:48.0654783Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/detail/linear_to_coordinate.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/detail 2025-09-07T09:27:48.0656553Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.0658232Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/convolution.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.0660338Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.0662468Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/gemm_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.0664636Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/gemm_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.0666822Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/gett.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.0668480Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel 2025-09-07T09:27:48.0670200Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/kernel/gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel 2025-09-07T09:27:48.0672443Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/kernel/tensor_elementwise.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel 2025-09-07T09:27:48.0674761Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/kernel/tensor_foreach.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel 2025-09-07T09:27:48.0676945Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/rank_2k_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.0679026Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/tensor_compare.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.0681101Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.0683183Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/tensor_foreach.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.0685269Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/tensor_reduce.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.0687345Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/tensor_relu.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.0688999Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/thread 2025-09-07T09:27:48.0690698Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/device/thread/gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/thread 2025-09-07T09:27:48.0692831Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0694440Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/conv.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0696510Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/convolution.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0698696Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/error_metrics.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0700757Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/gemm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0702808Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/gemm_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0705079Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/gemm_planar_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0707117Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/gett.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0709076Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/rank_2k.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0711086Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/rank_2k_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0713128Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/rank_k_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0715114Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/symm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0717113Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/symm_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0719157Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_compare.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0721217Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_compare.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0723274Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_copy.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0725344Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_elementwise.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0727457Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_fill.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0729537Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_fill.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0731593Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_foreach.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0733920Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_norm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0736018Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_reduce.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0738144Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/tensor_reduce.hpp -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0740235Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/trmm.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0742292Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/reference/host/trmm_complex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.0744207Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/tensor_view_io.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0745985Z #47 692.9 copying 3rdparty/cutlass/tools/util/include/cutlass/util/type_traits.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.0747191Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.0748172Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/async.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.0749397Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/async_logger-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.0750660Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/async_logger.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.0751663Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T09:27:48.0752690Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/cfg/argv.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T09:27:48.0753932Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/cfg/env.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T09:27:48.0755202Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/cfg/helpers-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T09:27:48.0756517Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/cfg/helpers.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T09:27:48.0757775Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/common-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.0758977Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/common.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.0759992Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0761181Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/backtracer-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0762671Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/backtracer.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0764106Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/circular_q.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0765563Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/console_globals.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0767023Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/file_helper-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0768466Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/file_helper.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0769866Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/fmt_helper.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0771261Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/log_msg-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0772737Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/log_msg.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0774355Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/log_msg_buffer-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0775839Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/log_msg_buffer.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0777314Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/mpmc_blocking_q.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0778779Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/null_mutex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0780201Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/os-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0781568Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/os.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0783034Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/periodic_worker-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0784570Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/periodic_worker.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0786142Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/registry-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0787570Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/registry.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0789038Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/synchronous_factory.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0790528Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/tcp_client-windows.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0792111Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/tcp_client.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0793782Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/thread_pool-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0795323Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/thread_pool.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0796821Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/udp_client-windows.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0798294Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/udp_client.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0799813Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/details/windows_include.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.0800981Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T09:27:48.0802043Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/bin_to_hex.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T09:27:48.0803167Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.0804468Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/args.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.0805983Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/chrono.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.0807448Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/color.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.0808903Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/compile.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.0810367Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/core.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.0811851Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/fmt.license.rst -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.0813625Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/format-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.0815156Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/format.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.0816661Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/locale.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.0818134Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/os.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.0819630Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/ostream.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.0821127Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/printf.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.0822634Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/ranges.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.0824119Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/std.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.0825670Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/bundled/xchar.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.0827111Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/chrono.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T09:27:48.0828386Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/compile.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T09:27:48.0829629Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/fmt.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T09:27:48.0830869Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/ostr.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T09:27:48.0832145Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/ranges.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T09:27:48.0833403Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/std.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T09:27:48.0834656Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fmt/xchar.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T09:27:48.0835885Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/formatter.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.0837113Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/fwd.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.0838305Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/logger-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.0839499Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/logger.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.0840668Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/mdc.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.0841904Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/pattern_formatter-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.0843230Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/pattern_formatter.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.0844283Z #47 692.9 creating build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0845381Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/android_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0846799Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/ansicolor_sink-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0848223Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/ansicolor_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0849604Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/base_sink-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0850962Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/base_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0852340Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/basic_file_sink-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0854078Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/basic_file_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0855512Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/callback_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0856980Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/daily_file_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0858416Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/dist_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0859819Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/dup_filter_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0861247Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/hourly_file_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0862696Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/kafka_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0864087Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/mongo_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0865554Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/msvc_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0866888Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/null_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0868225Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/ostream_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0869606Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/qt_sinks.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0870979Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/ringbuffer_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0872411Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/rotating_file_sink-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0873869Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/rotating_file_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0875252Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/sink-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0876562Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0877944Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/stdout_color_sinks-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0879382Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/stdout_color_sinks.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0880790Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/stdout_sinks-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0882195Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/stdout_sinks.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0883556Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/syslog_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0884926Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/systemd_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0886275Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/tcp_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0887581Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/udp_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0888972Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/win_eventlog_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0890405Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/wincolor_sink-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0891802Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/sinks/wincolor_sink.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.0893570Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/spdlog-inl.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.0894871Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/spdlog.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.0896102Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/stopwatch.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.0897353Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/tweakme.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.0898571Z #47 692.9 copying 3rdparty/spdlog/include/spdlog/version.h -> build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.0899362Z #47 692.9 running build_ext 2025-09-07T09:27:48.0899729Z #47 692.9 installing to build/bdist.linux-x86_64/wheel 2025-09-07T09:27:48.0900159Z #47 692.9 running install 2025-09-07T09:27:48.0900464Z #47 693.0 running install_lib 2025-09-07T09:27:48.1420318Z #47 693.0 creating build/bdist.linux-x86_64/wheel 2025-09-07T09:27:48.1420867Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer 2025-09-07T09:27:48.1421680Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1422728Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/__main__.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1423820Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/activation.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1424994Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/aot.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1426023Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/artifacts.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1427097Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/attention.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1428147Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/autotuner.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1429197Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/cascade.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1430240Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/cuda_utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1431285Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/decode.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1432314Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/deep_gemm.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1433395Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/fp4_quantization.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1434511Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/fp8_quantization.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1435577Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/gemm.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1436591Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/green_ctx.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1437777Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/mla.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1438816Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/norm.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1439822Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/page.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1440817Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/pod.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1441815Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/prefill.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1442930Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/quantization.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1443970Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/rope.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1444999Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/sampling.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1446038Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/sparse.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1447069Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/tllm_utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1448150Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1449174Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/_build_meta.py -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.1449979Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/cute_dsl 2025-09-07T09:27:48.1450917Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/cute_dsl/blockscaled_gemm.py -> build/bdist.linux-x86_64/wheel/./flashinfer/cute_dsl 2025-09-07T09:27:48.1452139Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/cute_dsl/utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer/cute_dsl 2025-09-07T09:27:48.1453285Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data 2025-09-07T09:27:48.1454180Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/custom_backend.py -> build/bdist.linux-x86_64/wheel/./flashinfer/data 2025-09-07T09:27:48.1455359Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/setup.py -> build/bdist.linux-x86_64/wheel/./flashinfer/data 2025-09-07T09:27:48.1456228Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc 2025-09-07T09:27:48.1457179Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/activation.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1458529Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/aot_extension_utils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1459895Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_attention.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1461333Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_attention_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1462847Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_attention_jit_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1464362Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_attention_paged_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1465864Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1467229Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode_config.inc -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1468668Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1470094Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode_jit_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1471496Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1472954Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode_mla_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1474352Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode_mla_cute_sm80.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1475732Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode_mla_plan.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1477082Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode_mla_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1478474Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_decode_mla_run.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1479826Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_mla_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1481125Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_mla_plan.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1482422Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_mla_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1483700Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_mla_run.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1484993Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_mla_sm90_plan.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1486331Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_mla_sm90_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1487647Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_mla_sm90_run.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1488940Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1490353Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_config.inc -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1491785Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1493747Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_fp8_paged_sm90_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1495367Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_fp8_ragged_sm90_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1496878Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_fp8_sm90.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1498654Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_jit_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1500277Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_paged_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1501954Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_paged_sm90_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1503723Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_ragged_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1505449Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_ragged_sm90_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1507000Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_sm90.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1508463Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_sm90_config.inc -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1510054Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_sm90_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1511687Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/batch_prefill_sm90_jit_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1513175Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/blackwell_fmha_plan.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1514593Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/bmm_fp8.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1531408Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/cascade.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1533157Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/cudnn_sdpa_kernel_launcher.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1534553Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/cudnn_sdpa_utils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1535876Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/cutlass_mla.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1537246Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_cascade_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1538641Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_gemm_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1540053Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_gemm_sm90_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1541448Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_mla_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1542809Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_norm_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1544174Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1545612Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_ops_sm90.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1547030Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_page_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1548456Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_quantization_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1549841Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_rope_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1551248Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/flashinfer_sampling_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1552613Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fmha_cutlass_sm100.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1553952Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fmha_cutlass_sm100_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1555299Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fp4_gemm_cutlass.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1556618Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fp4_gemm_cutlass.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1557955Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fp8_gemm_cutlass.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1559279Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fp8_gemm_cutlass.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1560279Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/fused_moe 2025-09-07T09:27:48.1561004Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/fused_moe/cutlass_backend 2025-09-07T09:27:48.1562387Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fused_moe/cutlass_backend/cutlass_fused_moe_instantiation.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/fused_moe/cutlass_backend 2025-09-07T09:27:48.1564306Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fused_moe/cutlass_backend/cutlass_fused_moe_kernels.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/fused_moe/cutlass_backend 2025-09-07T09:27:48.1566258Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_ops.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/fused_moe/cutlass_backend 2025-09-07T09:27:48.1567928Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/gemm_groupwise_sm100.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1569339Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/gemm_groupwise_sm100_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1570741Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/gemm_sm100_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1572010Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/group_gemm.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1573614Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/group_gemm_fp8_groupwise_sm100.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1575160Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/group_gemm_fp8_groupwise_sm100_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1576716Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/group_gemm_mxfp4_groupwise_sm100.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1578357Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/group_gemm_mxfp4_groupwise_sm100_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1579883Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/group_gemm_sm100_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1581254Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/group_gemm_sm90.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1582692Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/group_gemm_sm90_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1584084Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/logging.cc -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1585427Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/norm.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.1586375Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal 2025-09-07T09:27:48.1587074Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/cpp 2025-09-07T09:27:48.1587815Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T09:27:48.1589107Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common/envUtils.cpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T09:27:48.1590812Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common/logger.cpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T09:27:48.1592920Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common/memoryUtils.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T09:27:48.1594732Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common/stringUtils.cpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T09:27:48.1596543Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/common/tllmException.cpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/cpp/common 2025-09-07T09:27:48.1597879Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/cpp/kernels 2025-09-07T09:27:48.1599212Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/cpp/kernels/quantization.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/cpp/kernels 2025-09-07T09:27:48.1600499Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/include 2025-09-07T09:27:48.1601348Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/include/tensorrt_llm 2025-09-07T09:27:48.1602305Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:48.1603870Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/NvInferRuntime.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:48.1606047Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/assert.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:48.1608086Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/cudaBf16Wrapper.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:48.1610282Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/cudaFp8Utils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:48.1612324Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/cudaUtils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:48.1614728Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/dataType.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:48.1616787Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/logger.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:48.1618878Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/quantization.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:48.1621008Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/stringUtils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:48.1623179Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/tllmException.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common 2025-09-07T09:27:48.1624797Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm 2025-09-07T09:27:48.1625624Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:48.1627022Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common/cublasMMWrapper.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:48.1628992Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common/cudaBf16Fallbacks.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:48.1630972Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common/cudaDriverWrapper.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:48.1632913Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common/cudaTypeUtils.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:48.1634808Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common/envUtils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:48.1636680Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common/memoryUtils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:48.1638592Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common/quantTypeUtils.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:48.1640571Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common/reduceKernelUtils.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:48.1642528Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/common/workspace.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/common 2025-09-07T09:27:48.1643984Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions 2025-09-07T09:27:48.1645024Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include 2025-09-07T09:27:48.1646194Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T09:27:48.1647524Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T09:27:48.1649596Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/copy_red_global.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T09:27:48.1652493Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/copy_sm90_multimem.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T09:27:48.1655643Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/copy_traits_sm90_multimem.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T09:27:48.1658635Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/grid_dependency_control.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T09:27:48.1661517Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch 2025-09-07T09:27:48.1663636Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/communication 2025-09-07T09:27:48.1665375Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/communication/collective 2025-09-07T09:27:48.1668027Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/communication/collective/sm90_allreduce_nvls_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/communication/collective 2025-09-07T09:27:48.1671140Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/compute_occupancy.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T09:27:48.1673166Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/detail 2025-09-07T09:27:48.1674579Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/detail/collective 2025-09-07T09:27:48.1676886Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/detail/collective/mixed_input_utils.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/detail/collective 2025-09-07T09:27:48.1679199Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue 2025-09-07T09:27:48.1680623Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/collective 2025-09-07T09:27:48.1683104Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/collective/epilogue_moe_finalize.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/collective 2025-09-07T09:27:48.1685349Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion 2025-09-07T09:27:48.1687692Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion/sm90_visitor_allreduce_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion 2025-09-07T09:27:48.1690035Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/thread 2025-09-07T09:27:48.1692584Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/thread/fused_activations.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/thread 2025-09-07T09:27:48.1695589Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue_helpers.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T09:27:48.1697683Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm 2025-09-07T09:27:48.1699112Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:48.1700625Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders 2025-09-07T09:27:48.1703131Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders/sm90_gmma_builder_gated.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders 2025-09-07T09:27:48.1706598Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders/sm90_gmma_builder_interleaved.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders 2025-09-07T09:27:48.1709906Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders/sm90_gmma_builder_mixed_input.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders 2025-09-07T09:27:48.1713169Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_builder_gated.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:48.1716321Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_builder_interleaved.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:48.1719492Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_builder_mixed_input.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:48.1722615Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_mma_array_mixed_input.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:48.1725713Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_mma_gated.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:48.1728771Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_mma_interleaved.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:48.1731961Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_array_tma_gmma_rs_warpspecialized_mixed_input_.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:48.1735599Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_gated_tma_gmma_ss_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:48.1738987Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_gated_tma_gmma_ss_warpspecialized_fp8.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:48.1742494Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_interleaved_tma_gmma_rs_warpspecialized_mixed_input.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective 2025-09-07T09:27:48.1745097Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:48.1747326Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/default_fpA_intB_traits.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:48.1750370Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fused_moe_kernel.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:48.1753424Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fused_moe_kernel_routine.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:48.1756500Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fused_moe_kernel_traits.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:48.1759473Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/gemm_moe_problem_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:48.1762426Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/gemm_universal_allreduce.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:48.1765358Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/mixed_gemm_B_layout.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:48.1768224Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_cute_util.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:48.1771070Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_cutlass_kernel.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:48.1774235Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_problem_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:48.1777423Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/sm90_gemm_allreduce_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:48.1780798Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/sm90_gemm_allreduce_tma_warpspecialized_pingpong.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel 2025-09-07T09:27:48.1783292Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:48.1785821Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_dq_mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:48.1788741Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_dq_mma_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:48.1791619Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_dq_mma_pipelined.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:48.1795098Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:48.1798247Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_mma_bf16.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:48.1801374Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:48.1804534Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:48.1807847Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_multistage_finegrained.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:48.1811002Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_multistage_percol.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:48.1814396Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_pipelined.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:48.1817613Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_pipelined_finegrained.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:48.1820929Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_pipelined_percol.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock 2025-09-07T09:27:48.1823311Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp 2025-09-07T09:27:48.1825742Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp/default_mma_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp 2025-09-07T09:27:48.1828898Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp/mma_tensorop_compute_B_with_f16.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp 2025-09-07T09:27:48.1831941Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp/mma_tensorop_dequantizer.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp 2025-09-07T09:27:48.1834767Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm_configs.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T09:27:48.1837548Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/interleaved_numeric_conversion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T09:27:48.1840324Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/system_barrier.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T09:27:48.1843061Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/tile_interleaved_layout.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T09:27:48.1845133Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/transform 2025-09-07T09:27:48.1846577Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/transform/threadblock 2025-09-07T09:27:48.1849013Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/transform/threadblock/fine_grained_scale_zero_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/transform/threadblock 2025-09-07T09:27:48.1851335Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/util 2025-09-07T09:27:48.1853642Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/util/gather_tensor.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/util 2025-09-07T09:27:48.1856564Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/weight_only_quant_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions 2025-09-07T09:27:48.1858436Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T09:27:48.1859427Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels 2025-09-07T09:27:48.1861199Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels 2025-09-07T09:27:48.1863563Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels 2025-09-07T09:27:48.1866011Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_type_conversion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels 2025-09-07T09:27:48.1867840Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm 2025-09-07T09:27:48.1869804Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm/fp8_blockscale_gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm 2025-09-07T09:27:48.1872476Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm/fp8_blockscale_gemm_stub.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm 2025-09-07T09:27:48.1874433Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1876343Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_fg_scalebias.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1878936Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_fg_scaleonly.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1881504Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_per_col.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1884072Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_fg_scalebias.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1886651Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_fg_scaleonly.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1889235Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_per_col.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1892007Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scalebias_bf16_out_bf16.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1895075Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scalebias_f16_out_f16.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1897848Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scaleonly_bf16_out_bf16.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1900660Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scaleonly_f16_out_f16.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1903402Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_per_col_f16_out_f16.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1906168Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_fg_scalebias.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1908744Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_fg_scaleonly.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1911311Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_per_col.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1913870Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_fg_scalebias.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1916476Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_fg_scaleonly.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1919040Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_per_col.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1921523Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1924100Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1926734Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template_sm90.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm 2025-09-07T09:27:48.1928712Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers 2025-09-07T09:27:48.1930736Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers/fpA_intB_launcher_sm90.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers 2025-09-07T09:27:48.1933788Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers/fpA_intB_launcher_sm90.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers 2025-09-07T09:27:48.1935846Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T09:27:48.1937632Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/common.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T09:27:48.1940105Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/cutlass_kernel_selector.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T09:27:48.1942631Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/moe_gemm_kernels.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T09:27:48.1945086Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/moe_kernels.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T09:27:48.1947586Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/moe_util_kernels.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include 2025-09-07T09:27:48.1949358Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.1950629Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T09:27:48.1952559Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/fused_moe_gemm_launcher_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T09:27:48.1955199Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/fused_moe_gemm_launcher_sm80.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T09:27:48.1957843Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_launcher.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T09:27:48.1960478Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_launcher.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T09:27:48.1963158Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_mixed_input_launcher.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T09:27:48.1965876Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_mixed_input_launcher.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers 2025-09-07T09:27:48.1968456Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_bf16.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.1970869Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_fp4.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.1973522Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_fp8.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.1976080Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_uint4.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.1978667Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_uint8.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.1981226Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_fp16.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.1983790Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_fp4.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.1986477Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_uint4.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.1988919Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_uint8.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.1991374Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp32_fp32.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.1994223Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp4_fp4.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.1996823Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp8_fp4.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.1999362Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp8_fp8.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.2001916Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp8_uint4.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.2003178Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.2004432Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch_tma_ws.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.2005776Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch_tma_ws_mixed_dtype.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.2006968Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_tma_warp_specialized_input.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.2008148Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_tma_warp_specialized_traits.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm 2025-09-07T09:27:48.2009017Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/delayStream.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T09:27:48.2009898Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/delayStream.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T09:27:48.2010294Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora 2025-09-07T09:27:48.2011179Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora/lora.cpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora 2025-09-07T09:27:48.2012059Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora/lora.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora 2025-09-07T09:27:48.2013315Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/preQuantScaleKernel.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T09:27:48.2014290Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/preQuantScaleKernel.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T09:27:48.2015273Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/quantization.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T09:27:48.2016195Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/quantization.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels 2025-09-07T09:27:48.2016591Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/runtime 2025-09-07T09:27:48.2017521Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/runtime/torchUtils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/runtime 2025-09-07T09:27:48.2017891Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T09:27:48.2018814Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp4Op.cpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T09:27:48.2019713Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp4Quantize.cpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T09:27:48.2020600Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp4Quantize.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T09:27:48.2021511Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp8Quantize.cpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T09:27:48.2022403Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp8Quantize.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T09:27:48.2023260Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/thUtils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc/nv_internal/tensorrt_llm/thop 2025-09-07T09:27:48.2023887Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/nvshmem_binding.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2024441Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/page.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2025081Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/pod.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2025736Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/pod_config.inc -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2026313Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/pod_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2026840Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/pod_jit_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2027397Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/pod_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2028033Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/pytorch_conversion_utils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2028605Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/pytorch_extension_utils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2029153Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/quantization.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2029871Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/renorm.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2030397Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/rope.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2030951Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/runtime_utils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2031497Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/sampling.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2032064Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_decode.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2032692Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_decode_config.inc -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2033359Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_decode_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2033979Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_decode_jit_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2034612Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_decode_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2035175Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2035793Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_config.inc -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2036462Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2037061Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_fp8_sm90.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2037749Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_fp8_sm90_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2038365Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_jit_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2039002Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2039598Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_sm90.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2040222Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_sm90_config.inc -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2040911Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_sm90_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2041616Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_sm90_jit_pybind.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2042396Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/single_prefill_sm90_kernel_inst.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2042938Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_allreduce.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2043548Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_allreduce_fusion.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2044085Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_alltoall.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2044678Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_batched_gemm_runner.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2045266Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_fmha_kernel_launcher.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2045878Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_fused_moe_dev_kernel.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2046500Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_fused_moe_kernel_launcher.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2047116Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_fused_moe_routing_deepseek.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2047724Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_fused_moe_routing_llama4.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2048364Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_fused_moe_routing_renormalize.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2048935Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_fused_moe_runner.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2049482Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_gemm_runner.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2050069Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_mnnvl_allreduce.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2050664Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/trtllm_moe_allreduce_fusion.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2051230Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/csrc/vllm_custom_all_reduce.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/csrc 2025-09-07T09:27:48.2051459Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include 2025-09-07T09:27:48.2051718Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2052472Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/activation.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2053369Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/allocator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2054126Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/arch_condition.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2054523Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2054973Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/attention/blackwell 2025-09-07T09:27:48.2055446Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T09:27:48.2056586Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective/fmha_common.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T09:27:48.2057762Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective/fmha_fusion.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T09:27:48.2059053Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective/sm100_fmha_fwd_epilogue_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T09:27:48.2060386Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective/sm100_fmha_fwd_mainloop_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T09:27:48.2061655Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective/sm100_fmha_gen_epilogue_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T09:27:48.2062921Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective/sm100_fmha_gen_mainloop_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T09:27:48.2064209Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective/sm100_fmha_load_cpasync_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T09:27:48.2065541Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/collective/sm100_fmha_load_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/collective 2025-09-07T09:27:48.2065984Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/attention/blackwell/common 2025-09-07T09:27:48.2066966Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/common/pow_2.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/common 2025-09-07T09:27:48.2067393Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/attention/blackwell/device 2025-09-07T09:27:48.2068391Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/device/fmha.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/device 2025-09-07T09:27:48.2069393Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/device/sm100_mla.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/device 2025-09-07T09:27:48.2070345Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/fmha_cutlass_sm100.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell 2025-09-07T09:27:48.2070808Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T09:27:48.2071859Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel/fmha_options.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T09:27:48.2072910Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel/fmha_tile_scheduler.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T09:27:48.2073988Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel/gather_tensor.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T09:27:48.2075138Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel/sm100_fmha_fwd_kernel_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T09:27:48.2076284Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel/sm100_fmha_gen_kernel_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T09:27:48.2077387Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel/sm100_fmha_mla_reduction.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T09:27:48.2078507Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel/sm100_fmha_mla_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T09:27:48.2079587Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/kernel/sm100_mla_tile_scheduler.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell/kernel 2025-09-07T09:27:48.2080488Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/blackwell/plan.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/blackwell 2025-09-07T09:27:48.2081318Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/cascade.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2082140Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/cutlass_mla.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2082949Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/decode.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2083820Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/decode_mla_cute_sm80.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2084795Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/default_decode_params.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2085620Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/default_prefill_params.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2086368Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/heap.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2087156Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2087533Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:48.2088428Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/attention_updater.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:48.2089339Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/block_sparse_gather.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:48.2090215Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/default_params.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:48.2091056Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/epilogue.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:48.2092064Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/kernel_traits.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:48.2093299Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/mainloop.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:48.2094273Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/mainloop_mma.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:48.2095244Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/named_barrier.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:48.2096222Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:48.2096696Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T09:27:48.2097833Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization/epilogue.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T09:27:48.2098967Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization/kernel_traits.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T09:27:48.2100108Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization/mainloop_load.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T09:27:48.2101263Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization/mainloop_mma.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T09:27:48.2102437Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization/mainloop_sparse_load.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T09:27:48.2103572Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/quantization/prefill_sm90.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper/quantization 2025-09-07T09:27:48.2104870Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/sparse_mainloop.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:48.2105868Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/tile_scheduler.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:48.2106747Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/utils.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:48.2107614Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/variant_helper.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:48.2108456Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/hopper/variants.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention/hopper 2025-09-07T09:27:48.2109214Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/mask.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2109980Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/mla.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2110752Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/mla_hopper.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2111532Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/mla_params.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2112316Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/persistent.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2113158Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/persistent_template.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2113899Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/pod.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2114665Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/prefill.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2115452Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/scheduler.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2116206Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/state.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2116998Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/variant_helper.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2117783Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention/variants.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/attention 2025-09-07T09:27:48.2118480Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/attention_impl.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2118991Z #47 693.0 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/comm 2025-09-07T09:27:48.2119973Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm/trtllm_allreduce.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/comm 2025-09-07T09:27:48.2120821Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm/trtllm_allreduce_fusion.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/comm 2025-09-07T09:27:48.2121672Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm/trtllm_alltoall.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/comm 2025-09-07T09:27:48.2122513Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm/trtllm_mnnvl_allreduce.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/comm 2025-09-07T09:27:48.2123382Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm/trtllm_moe_allreduce_fusion.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/comm 2025-09-07T09:27:48.2124229Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/comm/vllm_custom_all_reduce.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/comm 2025-09-07T09:27:48.2124979Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/cp_async.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2125713Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/cubin_loader.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2126471Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/cutlass_utils.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2127194Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/exception.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2127905Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/fastdiv.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2128611Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/fp16.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2129336Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/fp4_layout.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2130100Z #47 693.0 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/frag_layout_swizzle.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2130429Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:48.2131194Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/bmm_fp8.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:48.2132036Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/cutlass_gemm_configs.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:48.2133053Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/fp4_gemm_cutlass.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:48.2133920Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/fp4_gemm_cutlass_template.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:48.2134861Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/fp4_gemm_template_sm100.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:48.2135682Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/fp8_gemm_cutlass.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:48.2136541Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/fp8_gemm_cutlass_template.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:48.2137439Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/fp8_gemm_template_sm100.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:48.2138295Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/gemm_groupwise_sm100.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:48.2139103Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/group_gemm.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:48.2140041Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/group_gemm_fp8_groupwise_sm100.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:48.2140876Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/group_gemm_lora.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:48.2141800Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/group_gemm_mxfp4_groupwise_sm100.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:48.2142627Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/group_gemm_sm90.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:48.2143436Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/gemm/group_gemv.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/gemm 2025-09-07T09:27:48.2144304Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/layout.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2145105Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/logging.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2145740Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/math.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2146383Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/mma.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2147022Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/norm.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2147659Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/page.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2148361Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/permuted_smem.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2149004Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/pos_enc.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2149705Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/profiler.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2150481Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/quantization.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2151145Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/sampling.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2151436Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm 2025-09-07T09:27:48.2151836Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/batched_gemm 2025-09-07T09:27:48.2444824Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/KernelRunner.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm 2025-09-07T09:27:48.2445488Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:48.2446696Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/BatchedGemmEnums.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:48.2448091Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/BatchedGemmInterface.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:48.2449294Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/BatchedGemmOptions.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:48.2450416Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/Enums.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:48.2451645Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/GemmGatedActOptions.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:48.2453094Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/GemmOptions.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:48.2454309Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/KernelParams.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:48.2455540Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/KernelParamsDecl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:48.2456745Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/KernelTraits.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:48.2457961Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/TmaDescriptor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export 2025-09-07T09:27:48.2458688Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm 2025-09-07T09:27:48.2459277Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T09:27:48.2460618Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/CommonUtils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T09:27:48.2462028Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/CudaKernelLauncher.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T09:27:48.2463384Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/DtypeDecl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T09:27:48.2464852Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/MmaDecl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T09:27:48.2466134Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen/SfLayoutDecl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/batched_gemm/trtllmGen_bmm_export/trtllm/gen 2025-09-07T09:27:48.2466921Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm 2025-09-07T09:27:48.2467286Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T09:27:48.2468247Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common/cudaBf16Fallbacks.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T09:27:48.2469171Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common/cudaBf16Wrapper.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T09:27:48.2470076Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common/cudaFp8Utils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T09:27:48.2471009Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common/cudaTypeUtils.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T09:27:48.2471895Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/common/cudaUtils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/common 2025-09-07T09:27:48.2472241Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T09:27:48.2472638Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/fmha/cubin 2025-09-07T09:27:48.2473607Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/cubin/kernelMetaInfo.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fmha/cubin 2025-09-07T09:27:48.2474499Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/decoder_impl_common.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T09:27:48.2475448Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/decoder_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T09:27:48.2476326Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/fmhaKernels.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T09:27:48.2477219Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/fmhaRunner.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T09:27:48.2478154Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/fmhaRunnerParams.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T09:27:48.2479038Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/kernelParams.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T09:27:48.2479876Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fmha/lse.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fmha 2025-09-07T09:27:48.2480248Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T09:27:48.2481166Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe/DevKernel.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T09:27:48.2482067Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe/IntFastDiv.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T09:27:48.2482988Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe/RoutingKernel.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T09:27:48.2483898Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe/RoutingKernel.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T09:27:48.2484860Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe/RoutingKernelTopK.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T09:27:48.2485731Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/fused_moe/runner.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/fused_moe 2025-09-07T09:27:48.2486092Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/gemm 2025-09-07T09:27:48.2486571Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T09:27:48.2487649Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/Enums.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T09:27:48.2488772Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/GemmInterface.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T09:27:48.2489867Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/GemmOptions.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T09:27:48.2491020Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/KernelParams.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T09:27:48.2492360Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/KernelTraits.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T09:27:48.2493820Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/TmaDescriptor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export 2025-09-07T09:27:48.2494379Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm 2025-09-07T09:27:48.2494928Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T09:27:48.2496196Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/CommonUtils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T09:27:48.2497557Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/CudaKernelLauncher.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T09:27:48.2498813Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/DtypeDecl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T09:27:48.2500042Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/MmaDecl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T09:27:48.2501328Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen/SfLayoutDecl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer/trtllm/gemm/trtllmGen_gemm_export/trtllm/gen 2025-09-07T09:27:48.2502057Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/utils.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2502822Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/include/flashinfer/vec_dtypes.cuh -> build/bdist.linux-x86_64/wheel/./flashinfer/data/include/flashinfer 2025-09-07T09:27:48.2503089Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/tvm_binding 2025-09-07T09:27:48.2503748Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_decode.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T09:27:48.2504633Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_decode_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T09:27:48.2505340Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_decode_jit_tvm_binding.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T09:27:48.2506005Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_mla_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T09:27:48.2506702Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_mla_jit_tvm_binding.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T09:27:48.2507375Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_mla_plan.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T09:27:48.2508039Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_mla_run.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T09:27:48.2508693Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_prefill.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T09:27:48.2509439Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_prefill_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T09:27:48.2510175Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_prefill_jit_tvm_binding.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T09:27:48.2510844Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_prefill_sm90.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T09:27:48.2511600Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_prefill_sm90_customize_config.jinja -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T09:27:48.2512360Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/batch_prefill_sm90_jit_tvm_binding.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T09:27:48.2512977Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/sampling.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T09:27:48.2513662Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/sampling_jit_tvm_binding.cu -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T09:27:48.2514321Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/tvm_binding/tvm_binding_utils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/tvm_binding 2025-09-07T09:27:48.2514839Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/version.txt -> build/bdist.linux-x86_64/wheel/./flashinfer/data 2025-09-07T09:27:48.2515063Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot 2025-09-07T09:27:48.2515330Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/logging 2025-09-07T09:27:48.2515968Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/logging/logging.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/logging 2025-09-07T09:27:48.2516839Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.2519135Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.2520048Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.2522470Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.2523346Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.2525524Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.2526381Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.2528675Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.2529540Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.2531825Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.2532997Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.2535513Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.2536369Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.2538617Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.2539573Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.2541910Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.2542873Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.2545510Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.2546415Z #47 693.1 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.2548812Z #47 693.1 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.2549627Z #47 693.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.2551737Z #47 693.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.3511670Z #47 693.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.3514430Z #47 693.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.3515334Z #47 693.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.3517759Z #47 693.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.3518701Z #47 693.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.3521166Z #47 693.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.3522015Z #47 693.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.3524370Z #47 693.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.3525171Z #47 693.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.3527300Z #47 693.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.3528182Z #47 693.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.3530336Z #47 693.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.3531174Z #47 693.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.3533835Z #47 693.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.3534681Z #47 693.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.3536897Z #47 693.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.3537783Z #47 693.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.3540141Z #47 693.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.3541039Z #47 693.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.3543498Z #47 693.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.3544500Z #47 693.2 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.3546915Z #47 693.2 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.3547701Z #47 693.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.3550148Z #47 693.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.4531197Z #47 693.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.4534015Z #47 693.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.4534964Z #47 693.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.4537444Z #47 693.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.4538601Z #47 693.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.4541378Z #47 693.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.4542246Z #47 693.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.4544697Z #47 693.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.4545672Z #47 693.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.4547947Z #47 693.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.4548825Z #47 693.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.4551137Z #47 693.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.4552057Z #47 693.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.4554487Z #47 693.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False 2025-09-07T09:27:48.4555374Z #47 693.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.4557531Z #47 693.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.4558404Z #47 693.3 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.4560698Z #47 693.3 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False 2025-09-07T09:27:48.4561614Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:48.4564029Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:48.4564963Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:48.4567431Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:48.4568368Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:48.4570855Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:48.4571786Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:48.4574666Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:48.4575649Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:48.4578269Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:48.4579254Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:48.4581892Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 2025-09-07T09:27:48.4582753Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ 2025-09-07T09:27:48.4585068Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ 2025-09-07T09:27:48.5685130Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ 2025-09-07T09:27:48.5688409Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ 2025-09-07T09:27:48.5691611Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 2025-09-07T09:27:48.5695351Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 2025-09-07T09:27:48.5698593Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 2025-09-07T09:27:48.5701858Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 2025-09-07T09:27:48.5705190Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ 2025-09-07T09:27:48.5708336Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_ 2025-09-07T09:27:48.5711474Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ 2025-09-07T09:27:48.5714709Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_ 2025-09-07T09:27:48.5718009Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 2025-09-07T09:27:48.5721217Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90 2025-09-07T09:27:48.5724471Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 2025-09-07T09:27:48.5727705Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90 2025-09-07T09:27:48.5730362Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/fmha_cutlass_sm100a 2025-09-07T09:27:48.5731585Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/fmha_cutlass_sm100a/fmha_cutlass_sm100a.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/fmha_cutlass_sm100a 2025-09-07T09:27:48.5733501Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False 2025-09-07T09:27:48.5736373Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False 2025-09-07T09:27:48.5739281Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 2025-09-07T09:27:48.5742247Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 2025-09-07T09:27:48.5745335Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False 2025-09-07T09:27:48.5748249Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False 2025-09-07T09:27:48.5751151Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 2025-09-07T09:27:48.5754115Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90 2025-09-07T09:27:48.5756539Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/mla 2025-09-07T09:27:48.5757481Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/mla/mla.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/mla 2025-09-07T09:27:48.5758415Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/cascade 2025-09-07T09:27:48.5759438Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/cascade/cascade.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/cascade 2025-09-07T09:27:48.5760431Z #47 693.4 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/norm 2025-09-07T09:27:48.5761383Z #47 693.4 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/norm/norm.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/norm 2025-09-07T09:27:48.5762333Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/page 2025-09-07T09:27:48.5763273Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/page/page.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/page 2025-09-07T09:27:48.5764251Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/quantization 2025-09-07T09:27:48.5765377Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/quantization/quantization.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/quantization 2025-09-07T09:27:48.5766470Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/rope 2025-09-07T09:27:48.5767423Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/rope/rope.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/rope 2025-09-07T09:27:48.5768376Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/sampling 2025-09-07T09:27:48.5769427Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/sampling/sampling.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/sampling 2025-09-07T09:27:48.5770477Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/aot/trtllm_utils 2025-09-07T09:27:48.6685401Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/aot/trtllm_utils/trtllm_utils.so -> build/bdist.linux-x86_64/wheel/./flashinfer/data/aot/trtllm_utils 2025-09-07T09:27:48.6686578Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass 2025-09-07T09:27:48.6687231Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include 2025-09-07T09:27:48.6688123Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cute 2025-09-07T09:27:48.6688970Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:48.6690318Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/axpby.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:48.6692358Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/clear.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:48.6694646Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/cooperative_copy.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:48.6696661Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/cooperative_gemm.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:48.6698591Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/copy.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:48.6700450Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/fill.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:48.6702421Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/functional.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:48.6704424Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/gemm.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:48.6706252Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/prefer.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:48.6708106Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/prefetch.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:48.6709996Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/tensor_algorithms.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:48.6711911Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/tensor_reduce.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:48.6713820Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/algorithm/tuple_algorithms.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/algorithm 2025-09-07T09:27:48.6715155Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6716431Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/cluster_sm100.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6718195Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/cluster_sm90.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6719918Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/config.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6721618Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/copy.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6723359Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/copy_sm100.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6725135Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/copy_sm100_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6726874Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/copy_sm50.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6728621Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/copy_sm75.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6730324Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/copy_sm80.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6732043Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/copy_sm90.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6734048Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/copy_sm90_desc.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6735881Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/copy_sm90_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6737646Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6739380Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm100.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6741147Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm100_desc.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6742951Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm100_umma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6744841Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm120.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6746569Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm120_sparse.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6748307Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm61.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6749992Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm70.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6751680Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm75.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6753380Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm80.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6755060Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm89.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6756826Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm90.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6758542Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm90_desc.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6760266Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm90_gmma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6762048Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm90_gmma_ext.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6763845Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm90_gmma_sparse.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6765645Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/mma_sm90_gmma_sparse_ext.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6767454Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/simd_sm100.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6769230Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/tmem_allocator_sm100.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6770962Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/arch/util.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/arch 2025-09-07T09:27:48.6772187Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6773706Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_atom.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6775480Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6777306Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm100.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6779187Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm100_im2col.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6781065Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm100_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6782923Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm50.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6784870Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm75.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6786626Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm80.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6788389Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm90.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6790258Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm90_im2col.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6792227Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm90_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6794362Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/copy_traits_sm90_tma_swizzle.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6796213Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_atom.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6797974Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6799783Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm100.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6801653Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm120.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6803503Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm120_sparse.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6805450Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm61.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6807213Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm70.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6808955Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm75.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6810712Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm80.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6812528Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm89.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6814484Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6816324Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90_gmma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6818202Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90_gmma_ext.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6820094Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90_gmma_sparse.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6822037Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90_gmma_sparse_ext.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6823986Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/atom/partitioner.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/atom 2025-09-07T09:27:48.6825842Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/config.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T09:27:48.6827061Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cute/container 2025-09-07T09:27:48.6828452Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container/alignment.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/container 2025-09-07T09:27:48.6830282Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container/array.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/container 2025-09-07T09:27:48.6832128Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container/array_aligned.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/container 2025-09-07T09:27:48.6834019Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container/array_subbyte.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/container 2025-09-07T09:27:48.6835893Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container/bit_field.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/container 2025-09-07T09:27:48.6837742Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container/cuda_types.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/container 2025-09-07T09:27:48.6839579Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container/tuple.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/container 2025-09-07T09:27:48.6841390Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/container/type_list.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/container 2025-09-07T09:27:48.6843118Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/int_tuple.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T09:27:48.6844721Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/layout.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T09:27:48.6846369Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/layout_composed.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T09:27:48.6847615Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:48.6848974Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric/arithmetic_tuple.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:48.6850797Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric/complex.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:48.6852661Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric/int.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:48.6854705Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric/integer_sequence.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:48.6856678Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric/integral_constant.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:48.6858651Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric/integral_ratio.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:48.6860521Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric/math.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:48.6862399Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric/numeric_types.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:48.6864263Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/numeric/real.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/numeric 2025-09-07T09:27:48.6866084Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/pointer.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T09:27:48.6867711Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/pointer_base.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T09:27:48.6869437Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/pointer_flagged.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T09:27:48.6871131Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/pointer_sparse.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T09:27:48.6872808Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/pointer_swizzle.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T09:27:48.6874463Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/stride.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T09:27:48.6876069Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/swizzle.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T09:27:48.6877699Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/swizzle_layout.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T09:27:48.6879338Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/tensor.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T09:27:48.6880957Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/tensor_impl.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T09:27:48.6882579Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/tensor_zip.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T09:27:48.6884222Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/underscore.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute 2025-09-07T09:27:48.6885417Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cute/util 2025-09-07T09:27:48.6886650Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util/debug.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/util 2025-09-07T09:27:48.6888341Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util/print.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/util 2025-09-07T09:27:48.6890074Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util/print_latex.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/util 2025-09-07T09:27:48.6891825Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util/print_svg.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/util 2025-09-07T09:27:48.6893983Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util/print_tensor.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/util 2025-09-07T09:27:48.6895837Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cute/util/type_traits.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cute/util 2025-09-07T09:27:48.6897122Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.6898403Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/aligned_buffer.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.6899693Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6901001Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/arch.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6902843Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/barrier.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6904699Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/cache_operation.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6906625Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/config.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6908462Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/grid_dependency_control.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6910288Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/memory.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6912058Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/memory_sm75.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6913840Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/memory_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6915577Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6917311Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sm100.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6919050Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sm50.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6920799Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sm60.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6922545Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sm61.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6924345Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sm70.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6926085Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sm75.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6927862Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6929596Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sm89.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6931345Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sm90.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6933372Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sparse_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6935277Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/mma_sparse_sm89.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6937148Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/reg_reconfig.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6938966Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/simd.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6940751Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/simd_sm60.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6942568Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/simd_sm61.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6944407Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/synclog.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6946276Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/wmma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6948017Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/wmma_sm70.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6949780Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/wmma_sm72.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6951518Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/arch/wmma_sm75.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/arch 2025-09-07T09:27:48.6953209Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/array.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.6954907Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/array_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.6956657Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/array_subbyte.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.6958363Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/barrier.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.6960028Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/bfloat16.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.6961684Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/blas3.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.6963339Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/blas3_types.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.6965034Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/block_striped.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.6966751Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/cluster_launch.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.6968479Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.6970145Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/constants.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.6971377Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T09:27:48.6972242Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T09:27:48.6973459Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T09:27:48.6975193Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/builders/sm100_common.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T09:27:48.6977622Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/builders/sm100_umma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T09:27:48.6980010Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/builders/sm90_common.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T09:27:48.6982407Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/builders/sm90_gmma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/collective/builders 2025-09-07T09:27:48.6984853Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/collective_builder.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T09:27:48.6987008Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/collective_conv.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T09:27:48.6989080Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/detail.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T09:27:48.6991321Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/sm100_implicit_gemm_umma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T09:27:48.6994085Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/collective/sm90_implicit_gemm_gmma_ss_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/collective 2025-09-07T09:27:48.6996258Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/conv2d_problem_size.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T09:27:48.6998268Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/conv3d_problem_size.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T09:27:48.7000215Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/convnd_problem_shape.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T09:27:48.7002134Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/convolution.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T09:27:48.7004035Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/detail.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T09:27:48.7005501Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T09:27:48.7006969Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/device/conv_universal_adapter.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T09:27:48.7009028Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/device/direct_convolution.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T09:27:48.7011100Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/device/implicit_gemm_convolution.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T09:27:48.7013551Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/device/implicit_gemm_convolution_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/device 2025-09-07T09:27:48.7015656Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/dispatch_policy.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv 2025-09-07T09:27:48.7017090Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7018579Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/conv_universal.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7020645Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7022727Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_dgrad.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7024925Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7027050Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7029219Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7031412Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7033586Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_with_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7035731Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_group_fprop.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7037786Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_wgrad.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7039882Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_wgrad_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7041958Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_dgrad.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7043981Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_fprop.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7046055Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_fprop_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7048197Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_fprop_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7050306Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_wgrad.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7052315Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_deconv2d.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7054679Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_deconv2d_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7056824Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_deconv3d.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7058958Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_deconv3d_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7061173Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/default_depthwise_fprop.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7063319Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/direct_convolution.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7065522Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7067677Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7069877Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_strided_dgrad.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7072084Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7074355Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_with_fused_epilogue.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7076622Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/sm100_implicit_gemm_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7078857Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/kernel/sm90_implicit_gemm_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/kernel 2025-09-07T09:27:48.7080407Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/conv/thread 2025-09-07T09:27:48.7081855Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/thread/depthwise_mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/thread 2025-09-07T09:27:48.7083302Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7084993Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_filter_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7087435Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_filter_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7089907Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_output_gradient_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7092799Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_output_gradient_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7095384Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7098052Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_few_channels.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7100634Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_fixed_channels.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7103256Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7105866Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7108287Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_few_channels.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7110796Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_fixed_channels.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7113249Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7115522Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7117644Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_tile_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7119967Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_activation_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7122447Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_activation_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7124941Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_output_gradient_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7127465Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_output_gradient_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7129936Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_filter_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7132453Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_filter_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7135207Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_output_gradient_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7137846Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_output_gradient_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7140428Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_activation_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7142984Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_activation_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7145663Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_filter_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7148099Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_filter_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7150368Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7152640Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_activation_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7155113Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_activation_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7157618Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_output_gradient_tile_access_iterator_analytic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7160121Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_output_gradient_tile_access_iterator_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7162523Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_direct_conv_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7165009Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_activation_tile_access_iterator_direct_conv_fixed_stride_dilation.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7167676Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_activation_tile_access_iterator_direct_conv_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7170212Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_direct_conv_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7172776Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_filter_tile_access_iterator_direct_conv_optimized.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7175447Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_pipelined.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7177702Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_mma_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7180043Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_mma_core_with_lane_access_size.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7182457Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/implicit_gemm_fprop_fusion_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7184897Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/implicit_gemm_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7187114Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/implicit_gemm_pipelined.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7189393Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/implicit_gemm_wgrad_fusion_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7191757Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/predicated_scale_bias_vector_access_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7194538Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/predicated_scale_bias_vector_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7196875Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/threadblock/threadblock_swizzle.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/threadblock 2025-09-07T09:27:48.7198450Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/conv/warp 2025-09-07T09:27:48.7199895Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/warp/mma_depthwise_simt.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/warp 2025-09-07T09:27:48.7201990Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/warp/mma_depthwise_simt_tile_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/warp 2025-09-07T09:27:48.7204179Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/conv/warp/scale_bias_relu_transform.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/conv/warp 2025-09-07T09:27:48.7206172Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/coord.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7207800Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/core_io.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7209527Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/cuda_host_adapter.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7211222Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/cutlass.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7212519Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:48.7214136Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/blockwise_scale_layout.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:48.7216218Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/cluster.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:48.7218149Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/collective.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:48.7219605Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/detail/collective 2025-09-07T09:27:48.7221230Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/collective/mixed_input_utils.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail/collective 2025-09-07T09:27:48.7223357Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/dependent_false.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:48.7225423Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/helper_macros.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:48.7227283Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/layout.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:48.7229245Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/mainloop_fusion_helper_scale_factor.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:48.7231179Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/mma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:48.7233066Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/sm100_blockscaled_layout.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:48.7235028Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/detail/sm100_tmem_helper.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/detail 2025-09-07T09:27:48.7236830Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/device_kernel.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7238142Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/epilogue 2025-09-07T09:27:48.7239085Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:48.7240104Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T09:27:48.7241878Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm100_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T09:27:48.7244352Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm120_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T09:27:48.7246773Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm120_common.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T09:27:48.7249198Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm90_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T09:27:48.7251640Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm90_common.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders 2025-09-07T09:27:48.7254287Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/collective_builder.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:48.7256652Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/collective_epilogue.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:48.7258992Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/default_epilogue.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:48.7261326Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/default_epilogue_array.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:48.7263620Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/detail.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:48.7266001Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/epilogue_tensor_broadcast.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:48.7268337Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_array_nosmem.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:48.7270723Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_array_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:48.7273089Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_nosmem.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:48.7275502Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:48.7277847Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm70_epilogue_vectorized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:48.7280201Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm70_epilogue_vectorized_array.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:48.7282598Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm90_epilogue_array_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:48.7285006Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm90_epilogue_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:48.7287511Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm90_epilogue_tma_warpspecialized_bias_elementwise.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/collective 2025-09-07T09:27:48.7289773Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/dispatch_policy.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue 2025-09-07T09:27:48.7291220Z #47 693.5 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:48.7293131Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/callbacks.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:48.7295299Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/operations.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:48.7297588Z #47 693.5 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm100_callbacks_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:48.7299994Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm100_visitor_compute_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:48.7302425Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm100_visitor_store_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:48.7304828Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm120_callbacks_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:48.7307258Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm120_visitor_store_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:48.7309594Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_callbacks_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:48.7312037Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_compute_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:48.7314356Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_load_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:48.7316714Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_store_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:48.7319027Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:48.7321269Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_topk_softmax.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/fusion 2025-09-07T09:27:48.7322871Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7324389Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/activation.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7326460Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/conversion_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7328522Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/detail.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7330610Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7333068Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_bias_elementwise.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7335382Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_bias_relu.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7337660Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_clamp.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7339923Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_dgelu.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7342173Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_drelu.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7344426Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_gelu.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7346800Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_generic.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7349056Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_generic_with_scaling.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7351355Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_hardswish.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7353574Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_leaky_relu.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7355775Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7358041Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7360262Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_relu.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7362446Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_relu0.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7364668Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_residual_block.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7366898Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_sigmoid.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7369089Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_silu.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7371320Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_tensor_broadcast.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7373907Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_with_elementwise.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7376168Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/reduction_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7378277Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/thread/scale_type.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/thread 2025-09-07T09:27:48.7379843Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7381637Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_complex_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7384168Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_complex_tensor_op_blas3.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7386736Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_direct_store.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7389113Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7391446Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7394187Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7396627Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensor_op_blas3.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7399085Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_volta_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7401511Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7403961Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7406488Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_with_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7408849Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_wmma_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7411180Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_simt.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7413759Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7416180Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_volta_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7418689Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_wmma_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7421191Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/direct_store_epilogue_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7423575Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7438561Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7440861Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_base_streamk.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7443228Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_depthwise.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7445529Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_direct_store.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7447832Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_gemm_k_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7450142Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7452584Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_smem_accumulator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7455189Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_streamk_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7457644Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_visitor_with_softmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7460022Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7462387Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7464878Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7467168Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7469577Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_visitor_callbacks.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7471906Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_workspace.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7473608Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T09:27:48.7475341Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_2x.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T09:27:48.7477748Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_compute.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T09:27:48.7480188Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_load.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T09:27:48.7482596Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_store.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T09:27:48.7484989Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitors.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion 2025-09-07T09:27:48.7487323Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/interleaved_epilogue.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7489635Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/output_iterator_parameter.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7492109Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/output_tile_thread_map.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7494735Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7497163Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_affine.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7499695Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_affine_layout_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7502229Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_blas3.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7504848Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_conv.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7507307Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_direct_conv.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7509776Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7512195Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_predicates.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7514638Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_strided_dgrad.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7517043Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7519351Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator_mixed.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7521695Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator_pitch_linear.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/threadblock 2025-09-07T09:27:48.7523351Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:48.7524934Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_complex_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:48.7527198Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_gaussian_complex_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:48.7529420Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_simt.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:48.7531559Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:48.7534009Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_volta_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:48.7536281Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_wmma_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:48.7538435Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/simt_policy.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:48.7540557Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/tensor_op_policy.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:48.7542714Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_simt.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:48.7544994Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:48.7547119Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_tensor_op_mixed.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:48.7549253Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_volta_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:48.7551402Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_wmma_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:48.7553557Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/volta_tensor_op_policy.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:48.7555625Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/epilogue/warp/wmma_tensor_op_policy.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/epilogue/warp 2025-09-07T09:27:48.7557500Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/exmy_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7558791Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/experimental 2025-09-07T09:27:48.7559756Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/experimental/distributed 2025-09-07T09:27:48.7560842Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/experimental/distributed/device 2025-09-07T09:27:48.7562627Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/device/detail.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/experimental/distributed/device 2025-09-07T09:27:48.7565166Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/device/dist_gemm_universal_wrapper.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/experimental/distributed/device 2025-09-07T09:27:48.7567726Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/device/full_barrier.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/experimental/distributed/device 2025-09-07T09:27:48.7569541Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel 2025-09-07T09:27:48.7571318Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel/detail.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel 2025-09-07T09:27:48.7574122Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel/dist_gemm_kernel_wrapper.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel 2025-09-07T09:27:48.7576827Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel/full_barrier.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel 2025-09-07T09:27:48.7578694Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules 2025-09-07T09:27:48.7580680Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules/dist_gemm_1d_schedules.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules 2025-09-07T09:27:48.7583408Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules/dist_gemm_base_schedule.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules 2025-09-07T09:27:48.7585711Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/fast_math.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7587399Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/float8.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7589081Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/float_subbyte.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7590810Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/floating_point_nvrtc.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7592895Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/functional.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7594194Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/gemm 2025-09-07T09:27:48.7595077Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7596090Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7597876Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_9xBF16_umma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7600440Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_blockscaled_sparse_umma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7603039Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_blockscaled_umma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7605668Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_blockwise_umma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7608045Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_common.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7610484Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_pipeline_carveout.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7613158Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_simt_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7615659Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_sparse_umma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7618131Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_umma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7620624Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_blockscaled_mma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7623229Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_blockscaled_sparse_mma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7625919Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_blockwise_mma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7628298Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_common.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7630639Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_mma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7633011Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_sparse_mma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7635383Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm1xx_common.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7637728Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm1xx_sparse_config.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7638819Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm90_common.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7639931Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm90_gmma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7641061Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm90_sparse_config.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7642230Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm90_sparse_gmma_builder.inl -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective/builders 2025-09-07T09:27:48.7643291Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/collective_builder.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7644326Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/collective_builder_decl.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7645346Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/collective_mma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7646378Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/collective_mma_decl.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7647377Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/fp8_accumulation.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7648556Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7649661Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7650799Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_sparse_mma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7651883Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7653281Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized_blockwise_scaling.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7654430Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized_emulated.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7655524Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7656684Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_warpspecialized_blockwise_scaling.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7685866Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_warpspecialized_emulated.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7687029Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_sparse_mma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7688232Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_blockscaled_mma_array_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7689277Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_blockscaled_mma_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7690387Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_blockscaled_sparse_mma_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7691474Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_mma_array_tma_blockwise_scaling.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7692944Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_mma_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7694135Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_mma_tma_blockwise_scaling.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7695174Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_sparse_mma_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7696216Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm70_mma_twostage.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7697280Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm80_mma_array_multistage.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7698317Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm80_mma_multistage.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7699532Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_rs_warpspecialized_mixed_input.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7700679Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7701845Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized_fp8.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7703089Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized_fp8_blockwise_scaling.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7704245Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_multistage_gmma_rs_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7705602Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_multistage_gmma_ss_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7706673Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_rs_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7707852Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_rs_warpspecialized_mixed_input.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7708846Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7709929Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7711069Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized_fp8.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7712259Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized_fp8_blockwise_scaling.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7713386Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_sparse_mma_tma_gmma_ss_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7714509Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_sparse_mma_tma_gmma_ss_warpspecialized_fp8.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/collective 2025-09-07T09:27:48.7714889Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7715826Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/base_grouped.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7716815Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/default_gemm_configuration.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7717710Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/ell_gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7718594Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7719493Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_array.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7720417Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_batched.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7721357Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7722301Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_grouped.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7723313Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_layernorm_mainloop_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7724240Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7725197Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7726218Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse_universal_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7727211Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7728193Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse_with_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7729141Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_splitk_parallel.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7730068Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7731044Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7731982Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7733250Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_streamk_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7734282Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7735301Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7736291Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_with_k_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7737194Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/gemv.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7738163Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/rank_2k.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7739125Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/rank_2k_grouped.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7740105Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/rank_k.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7741008Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/symm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7741926Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/device/trmm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/device 2025-09-07T09:27:48.7742829Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/dispatch_policy.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm 2025-09-07T09:27:48.7743700Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm 2025-09-07T09:27:48.7744611Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/gemm_enumerated_types.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm 2025-09-07T09:27:48.7745635Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/group_array_problem_shape.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm 2025-09-07T09:27:48.7746026Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7746960Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_ell_gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7747878Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7748853Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7749802Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_grouped.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7750824Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_grouped_per_group_scale.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7751904Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_grouped_softmax_mainloop_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7752947Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_layernorm_mainloop_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7754026Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_planar_complex_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7755005Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7756000Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7757087Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_universal_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7758088Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7759112Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_with_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7760143Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_splitk_parallel.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7761171Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_streamk_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7762152Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7763173Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_universal_with_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7764144Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7765144Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_with_broadcast.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7766133Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_with_k_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7767123Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_with_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7768045Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemv.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7768966Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_2k.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7769940Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_2k_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7770958Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_2k_grouped.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7771939Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_2k_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7773132Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_k.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7774126Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_k_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7775143Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_k_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7776112Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_symm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7777096Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_symm_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7778104Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_symm_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7779050Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_trmm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7780044Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_trmm_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7781036Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_trmm_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7781953Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/ell_gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7782865Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7783795Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_array.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7784845Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_batched.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7785922Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_grouped.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7786900Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_grouped_per_group_scale.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7787957Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_grouped_problem_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7788996Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_grouped_softmax_mainloop_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7790019Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_layernorm_mainloop_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7790921Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7791856Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_pipelined.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7793312Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7794332Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_planar_complex_array.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7795315Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_sparse_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7796361Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_sparse_universal_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7797355Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_splitk_parallel.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7798388Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_streamk_with_fused_epilogue.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7799383Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_transpose_operands.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7800356Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7801322Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7802308Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal_decl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7803292Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal_streamk.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7804346Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal_with_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7805532Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal_with_visitor_streamk.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7806495Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7807465Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_with_fused_epilogue.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7808415Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_with_k_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7809287Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemv.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7810268Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemv_batched_strided.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7811248Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/grouped_problem_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7812187Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/params_sparse_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7813401Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/params_universal_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7814371Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_2k_grouped.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7815391Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_2k_grouped_problem_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7816402Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_2k_transpose_operands.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7817354Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_2k_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7818304Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_k_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7819385Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7820534Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized_input_transform.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7821731Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized_mma_transform.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7822785Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7823940Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized_input_transform.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7825154Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized_mma_transform.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7826197Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_sparse_gemm_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7827215Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_static_tile_scheduler.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7828177Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_tile_scheduler.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7829165Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_tile_scheduler_group.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7830161Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_tile_scheduler_stream_k.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7831297Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm120_gemm_tma_warpspecialized_cooperative_asymmetric_dma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7832196Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm70_gemm.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7833139Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm70_gemm_array.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7834233Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_array_tma_warpspecialized_cooperative.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7835308Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_array_tma_warpspecialized_pingpong.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7836233Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7837265Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7838365Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_cooperative.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7839437Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7840426Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7841492Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized_cooperative.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7842526Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized_pingpong.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7843513Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_tile_scheduler.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7844490Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_tile_scheduler_group.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7845470Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_tile_scheduler_stream_k.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7846385Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sparse_gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7847339Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sparse_gemm_with_absmax.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7848302Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/sparse_gemm_with_visitor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7849271Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/static_tile_scheduler.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7850190Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/symm_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7851214Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/tile_scheduler.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7852173Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/tile_scheduler_detail.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7853357Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/tile_scheduler_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7854380Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/kernel/trmm_universal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/kernel 2025-09-07T09:27:48.7854762Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T09:27:48.7855655Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/thread/mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T09:27:48.7856602Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/thread/mma_sm50.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T09:27:48.7857512Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/thread/mma_sm60.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T09:27:48.7858423Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/thread/mma_sm61.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/thread 2025-09-07T09:27:48.7858866Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7859885Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_ell_mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7860920Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_gemv_core.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7861918Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7862938Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7863987Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_simt.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7865134Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sm70.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7866138Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sm75.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7867147Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7868200Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sparse_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7869267Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_with_access_size.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7870353Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_with_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7871392Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_wmma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7872515Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_layernorm_mainloop_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7873611Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_planar_complex_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7874711Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_planar_complex_pipelined.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7875813Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_softmax_mainloop_fusion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7876847Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_with_reduction.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7877917Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_multistage_mma_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7878997Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_multistage_mma_complex_core.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7880105Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_multistage_mma_complex_core_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7881171Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_multistage_trmm_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7882170Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_sparse_mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7883158Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_trmm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7884156Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/ell_mma_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7885143Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/ell_mma_pipelined.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7886075Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/gemv.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7887105Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/index_remat.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7888058Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7889093Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_blas3_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7890198Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_layernorm_mainloop_fusion_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7891189Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7892649Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_pipelined.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7893714Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_planar_complex_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7894794Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_planar_complex_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7895877Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_planar_complex_pipelined.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7896907Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_singlestage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7898037Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_softmax_mainloop_fusion_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7899036Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_sparse_base.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7900096Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_sparse_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7901186Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_with_reduction_multistage.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7902250Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/threadblock_swizzle.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7903345Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/threadblock/threadblock_swizzle_streamk.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/threadblock 2025-09-07T09:27:48.7903799Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7904912Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_complex_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7905913Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_sparse_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7906835Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7907789Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_tensor_op_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7908788Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_with_reduction_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7909768Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_wmma_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7910750Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/layernorm_scale_bias_transform.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7911602Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7912537Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_complex_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7913488Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_complex_tensor_op_fast_f32.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7914489Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_complex_tensor_op_tile_iterator_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7915456Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_gaussian_complex_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7916495Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_gaussian_complex_tensor_op_tile_iterator_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7917437Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_mixed_input_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7918348Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7919214Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_simt.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7920168Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_simt_policy.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7921091Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_simt_tile_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7922028Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_sparse_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7922912Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7923826Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_fast_f32.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7924829Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_fragment_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7925745Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_policy.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7926641Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_sm70.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7927630Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_access_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7928573Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7929535Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sm70.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7930506Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7931481Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sparse.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7932506Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_wmma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7933603Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_wmma.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7934584Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_with_reduction_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7935601Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/scale_bias_tile_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7936636Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/softmax_scale_bias_transform.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7937631Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm/warp/tile_iterator_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/gemm/warp 2025-09-07T09:27:48.7938454Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm_coord.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7939259Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/gemm_coord.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7940030Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/half.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7940850Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/integer_subbyte.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7941722Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/kernel_hardware_info.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7942586Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/kernel_hardware_info.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7943394Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/kernel_launch.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7943754Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.7944623Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout/layout.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.7945582Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout/matrix.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.7946427Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout/permute.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.7947297Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout/pitch_linear.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.7948127Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout/tensor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.7949065Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout/tensor_op_multiplicand_sm70.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.7949993Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout/tensor_op_multiplicand_sm75.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.7950913Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout/tensor_op_multiplicand_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.7951829Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/layout/vector.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/layout 2025-09-07T09:27:48.7952578Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/matrix.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7953354Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/matrix_coord.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7954163Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/matrix_shape.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7954972Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/numeric_conversion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7955751Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/numeric_size.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7956573Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/numeric_types.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7956933Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/pipeline 2025-09-07T09:27:48.7957833Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/pipeline/pipeline.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/pipeline 2025-09-07T09:27:48.7958744Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/pipeline/sm100_pipeline.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/pipeline 2025-09-07T09:27:48.7959647Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/pipeline/sm90_pipeline.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/pipeline 2025-09-07T09:27:48.7960458Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/pitch_linear_coord.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7960811Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/platform 2025-09-07T09:27:48.7961686Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/platform/platform.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/platform 2025-09-07T09:27:48.7962502Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/predicate_vector.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7963288Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/quaternion.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7964025Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/real.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7964396Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/reduction 2025-09-07T09:27:48.7964798Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T09:27:48.7965785Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/device/reduce_split_k.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T09:27:48.7966804Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/device/tensor_reduce.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T09:27:48.7967900Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/device/tensor_reduce_affine_contiguous.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T09:27:48.7968985Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/device/tensor_reduce_affine_strided.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/device 2025-09-07T09:27:48.7969386Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T09:27:48.7970398Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/kernel/reduce_softmax_final.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T09:27:48.7971383Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/kernel/reduce_split_k.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T09:27:48.7972533Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/kernel/tensor_reduce_affine_contiguous.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T09:27:48.7973785Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/kernel/tensor_reduce_affine_strided.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/kernel 2025-09-07T09:27:48.7974203Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/reduction/thread 2025-09-07T09:27:48.7975190Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/thread/reduce.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/thread 2025-09-07T09:27:48.7976247Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/thread/reduction_operators.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction/thread 2025-09-07T09:27:48.7977211Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/reduction/threadblock_swizzle.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/reduction 2025-09-07T09:27:48.7978041Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/relatively_equal.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7978850Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/semaphore.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7979682Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/subbyte_reference.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7980485Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/tensor_coord.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7981284Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/tensor_ref.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7982147Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/tensor_ref_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7982996Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/tensor_view.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7983867Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/tensor_view_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7984743Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/tfloat32.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7985218Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/thread 2025-09-07T09:27:48.7986031Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/thread/matrix.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/thread 2025-09-07T09:27:48.7986749Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/trace.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.7987103Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/transform 2025-09-07T09:27:48.7987513Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/transform/collective 2025-09-07T09:27:48.7988588Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/collective/sm90_wgmma_transpose.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/collective 2025-09-07T09:27:48.7988984Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/transform/device 2025-09-07T09:27:48.7990020Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/device/transform_universal_adapter.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/device 2025-09-07T09:27:48.7990408Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/transform/kernel 2025-09-07T09:27:48.7991443Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/kernel/filter_format_transformer.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/kernel 2025-09-07T09:27:48.7992986Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/kernel/sm90_sparse_gemm_compressor.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/kernel 2025-09-07T09:27:48.7994058Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/kernel/sparse_gemm_compressor.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/kernel 2025-09-07T09:27:48.7995035Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/pitch_linear_thread_map.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform 2025-09-07T09:27:48.7995451Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/transform/thread 2025-09-07T09:27:48.7996453Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/thread/transpose.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/thread 2025-09-07T09:27:48.7997449Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/thread/unary_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/thread 2025-09-07T09:27:48.7997892Z #47 693.6 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.7999036Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/ell_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8000300Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/ell_predicated_tile_access_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8001480Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/ell_predicated_tile_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8002726Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_scale_bias_vector_access_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8003934Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_scale_bias_vector_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8005252Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_tile_access_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8006464Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_tile_access_iterator_2dthreadtile.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8007632Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_tile_access_iterator_params.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8008863Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_tile_access_iterator_triangular_matrix.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8009974Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_tile_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8011249Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_tile_iterator_2dthreadtile.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8012633Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_tile_iterator_triangular_matrix.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8013982Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/predicated_vector_access_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8015199Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_scale_bias_vector_access_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8016356Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8017900Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator_pitch_linear.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8019177Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator_pitch_linear_direct_conv.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8020400Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8021625Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_access_iterator_tensor_op_sm80.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8022739Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8023951Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator_pitch_linear.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8025314Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator_pitch_linear_2dthreadtile.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8026469Z #47 693.6 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator_tensor_op.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8027521Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/regular_tile_iterator_tensor_op_sm70.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8028483Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/threadblock/vector_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/threadblock 2025-09-07T09:27:48.8028836Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/include/cutlass/transform/warp 2025-09-07T09:27:48.8029769Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/transform/warp/vector_fragment_iterator.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass/transform/warp 2025-09-07T09:27:48.8030469Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/uint128.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.8031156Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/version.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.8031865Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/wmma_array.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.8032568Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/include/cutlass/workspace.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/include/cutlass 2025-09-07T09:27:48.8032868Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools 2025-09-07T09:27:48.8033123Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util 2025-09-07T09:27:48.8033410Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util/include 2025-09-07T09:27:48.8033736Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util/include/cutlass 2025-09-07T09:27:48.8034110Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8035009Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/GPU_Clock.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8035907Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/command_line.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8036820Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/cublas_wrappers.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8037689Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/debug.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8038570Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_dump.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8039469Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_groupnorm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8040369Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_layernorm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8041259Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_memory.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8042154Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_nchw_to_nhwc.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8043062Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_nhwc_padding.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8043971Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_nhwc_pooling.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8044869Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_nhwc_to_nchw.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8045759Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_rmsnorm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8046636Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/device_utils.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8047561Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/distribution.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8048477Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/exceptions.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8049412Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/gett_commandline.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8050297Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/helper_cuda.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8051175Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/host_reorder.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8052044Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/host_tensor.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8053286Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/host_tensor_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8054290Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/host_uncompress.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8055295Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/index_sequence.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8056329Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/mixed_dtype_utils.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8057341Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/packed_stride.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8058347Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/print_error.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8058814Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference 2025-09-07T09:27:48.8059321Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/detail 2025-09-07T09:27:48.8060553Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/detail/inner_product.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/detail 2025-09-07T09:27:48.8061806Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/detail/linear_to_coordinate.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/detail 2025-09-07T09:27:48.8062313Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.8063570Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/convolution.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.8064885Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.8066153Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/gemm_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.8067254Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/gemm_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.8068303Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/gett.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.8068802Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel 2025-09-07T09:27:48.8069947Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel/gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel 2025-09-07T09:27:48.8071136Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel/tensor_elementwise.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel 2025-09-07T09:27:48.8072306Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel/tensor_foreach.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/kernel 2025-09-07T09:27:48.8073387Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/rank_2k_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.8074486Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_compare.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.8075554Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.8076637Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_foreach.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.8077722Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_reduce.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.8078787Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_relu.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device 2025-09-07T09:27:48.8079312Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/thread 2025-09-07T09:27:48.8080465Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/thread/gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/thread 2025-09-07T09:27:48.8080903Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8081969Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/conv.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8083034Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/convolution.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8084093Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/error_metrics.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8085145Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/gemm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8086200Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/gemm_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8087282Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/gemm_planar_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8088314Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/gett.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8089342Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/rank_2k.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8090404Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/rank_2k_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8091458Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/rank_k_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8093075Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/symm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8094267Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/symm_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8095472Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_compare.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8096764Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_compare.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8097998Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_copy.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8099268Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_elementwise.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8100446Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_fill.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8101642Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_fill.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8102891Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_foreach.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8104073Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_norm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8105477Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_reduce.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8106550Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/tensor_reduce.hpp -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8107560Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/trmm.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8108623Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host/trmm_complex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/host 2025-09-07T09:27:48.8109667Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/tensor_view_io.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8110599Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/cutlass/tools/util/include/cutlass/util/type_traits.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/cutlass/tools/util/include/cutlass/util 2025-09-07T09:27:48.8111005Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/spdlog 2025-09-07T09:27:48.8111270Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/spdlog/include 2025-09-07T09:27:48.8111562Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.8112301Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/async.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.8113161Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/async_logger-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.8113949Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/async_logger.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.8114270Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T09:27:48.8115032Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/cfg/argv.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T09:27:48.8115805Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/cfg/env.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T09:27:48.8116616Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/cfg/helpers-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T09:27:48.8117397Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/cfg/helpers.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/cfg 2025-09-07T09:27:48.8118160Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/common-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.8118915Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/common.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.8119255Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8120139Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/backtracer-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8120993Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/backtracer.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8121835Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/circular_q.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8122717Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/console_globals.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8123580Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/file_helper-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8124423Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/file_helper.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8125267Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/fmt_helper.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8126101Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/log_msg-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8126924Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/log_msg.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8127782Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/log_msg_buffer-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8128651Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/log_msg_buffer.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8129559Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/mpmc_blocking_q.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8130389Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/null_mutex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8131228Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/os-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8132025Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/os.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8133164Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/periodic_worker-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8134097Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/periodic_worker.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8134995Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/registry-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8135857Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/registry.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8136791Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/synchronous_factory.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8137696Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/tcp_client-windows.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8138555Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/tcp_client.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8139460Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/thread_pool-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8140327Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/thread_pool.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8141230Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/udp_client-windows.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8142099Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/udp_client.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8142986Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/details/windows_include.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/details 2025-09-07T09:27:48.8143314Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T09:27:48.8144220Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bin_to_hex.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T09:27:48.8144634Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.8145486Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/args.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.8146352Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/chrono.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.8147242Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/color.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.8148120Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/compile.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.8148971Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/core.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.8149892Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/fmt.license.rst -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.8150788Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/format-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.8151648Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/format.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.8152509Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/locale.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.8153353Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/os.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.8154228Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/ostream.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.8155103Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/printf.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.8155962Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/ranges.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.8156803Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/std.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.8157694Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/bundled/xchar.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt/bundled 2025-09-07T09:27:48.8158472Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/chrono.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T09:27:48.8159259Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/compile.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T09:27:48.8160058Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/fmt.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T09:27:48.8160854Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/ostr.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T09:27:48.8161645Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/ranges.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T09:27:48.8162428Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/std.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T09:27:48.8163198Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fmt/xchar.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/fmt 2025-09-07T09:27:48.8163970Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/formatter.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.8164682Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/fwd.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.8165466Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/logger-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.8166215Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/logger.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.8166924Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/mdc.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.8167744Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/pattern_formatter-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.8168556Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/pattern_formatter.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.8168886Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8169721Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/android_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8170595Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/ansicolor_sink-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8171438Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/ansicolor_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8172279Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/base_sink-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8173325Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/base_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8174213Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/basic_file_sink-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8815839Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/basic_file_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8816950Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/callback_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8817867Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/daily_file_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8818708Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/dist_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8819615Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/dup_filter_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8820500Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/hourly_file_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8821338Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/kafka_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8822232Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/mongo_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8823071Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/msvc_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8823899Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/null_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8824760Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/ostream_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8825688Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/qt_sinks.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8826532Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/ringbuffer_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8827427Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/rotating_file_sink-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8828280Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/rotating_file_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8829086Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/sink-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8829881Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8830761Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/stdout_color_sinks-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8831626Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/stdout_color_sinks.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8832518Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/stdout_sinks-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8833369Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/stdout_sinks.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8834203Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/syslog_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8835056Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/systemd_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8835854Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/tcp_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8836672Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/udp_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8837510Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/win_eventlog_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8838386Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/wincolor_sink-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8839229Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/sinks/wincolor_sink.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog/sinks 2025-09-07T09:27:48.8839984Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/spdlog-inl.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.8840736Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/spdlog.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.8841483Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/stopwatch.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.8842226Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/tweakme.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.8842981Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/data/spdlog/include/spdlog/version.h -> build/bdist.linux-x86_64/wheel/./flashinfer/data/spdlog/include/spdlog 2025-09-07T09:27:48.8843215Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/fused_moe 2025-09-07T09:27:48.8843754Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/fused_moe/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/fused_moe 2025-09-07T09:27:48.8844295Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/fused_moe/core.py -> build/bdist.linux-x86_64/wheel/./flashinfer/fused_moe 2025-09-07T09:27:48.8844824Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/fused_moe/utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer/fused_moe 2025-09-07T09:27:48.8845023Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/jit 2025-09-07T09:27:48.8845521Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit 2025-09-07T09:27:48.8846030Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/activation.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit 2025-09-07T09:27:48.8846504Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/core.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit 2025-09-07T09:27:48.8847056Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/cpp_ext.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit 2025-09-07T09:27:48.8847563Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/cubin_loader.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit 2025-09-07T09:27:48.8848024Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/env.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit 2025-09-07T09:27:48.8848508Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit 2025-09-07T09:27:48.8848780Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/jit/attention 2025-09-07T09:27:48.8849366Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit/attention 2025-09-07T09:27:48.8849984Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention/pytorch.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit/attention 2025-09-07T09:27:48.8850560Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention/tvm.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit/attention 2025-09-07T09:27:48.8851146Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention/utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit/attention 2025-09-07T09:27:48.8851791Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/attention/variants.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit/attention 2025-09-07T09:27:48.8852052Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/jit/cutlass_gemm 2025-09-07T09:27:48.8852749Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/cutlass_gemm/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit/cutlass_gemm 2025-09-07T09:27:48.8853610Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/cutlass_gemm/cutlass_library.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit/cutlass_gemm 2025-09-07T09:27:48.8854293Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/jit/cutlass_gemm/generate_kernels.py -> build/bdist.linux-x86_64/wheel/./flashinfer/jit/cutlass_gemm 2025-09-07T09:27:48.8854518Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/testing 2025-09-07T09:27:48.8855065Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/testing/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/testing 2025-09-07T09:27:48.8855601Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/testing/utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer/testing 2025-09-07T09:27:48.8855821Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/triton 2025-09-07T09:27:48.8856354Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton 2025-09-07T09:27:48.8856914Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/activation.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton 2025-09-07T09:27:48.8857450Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/cascade.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton 2025-09-07T09:27:48.8857976Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/gemm.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton 2025-09-07T09:27:48.8858493Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/norm.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton 2025-09-07T09:27:48.8859009Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/page.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton 2025-09-07T09:27:48.8859602Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/sm_constraint_gemm.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton 2025-09-07T09:27:48.8860157Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton 2025-09-07T09:27:48.8860440Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/triton/kernels 2025-09-07T09:27:48.8861065Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton/kernels 2025-09-07T09:27:48.8861721Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels/activation.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton/kernels 2025-09-07T09:27:48.8862374Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels/cascade.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton/kernels 2025-09-07T09:27:48.8862991Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels/norm.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton/kernels 2025-09-07T09:27:48.8863606Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels/quant.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton/kernels 2025-09-07T09:27:48.8864282Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/triton/kernels/sm_constraint_gemm.py -> build/bdist.linux-x86_64/wheel/./flashinfer/triton/kernels 2025-09-07T09:27:48.8864553Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/tuning_configs 2025-09-07T09:27:48.8865396Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/tuning_configs/v0_1_trtllm_fused_moe_NVIDIA_B200.py -> build/bdist.linux-x86_64/wheel/./flashinfer/tuning_configs 2025-09-07T09:27:48.8865622Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/profiler 2025-09-07T09:27:48.8866171Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/profiler/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/profiler 2025-09-07T09:27:48.8866372Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/comm 2025-09-07T09:27:48.8866858Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T09:27:48.8867369Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/cuda_ipc.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T09:27:48.8867886Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/dlpack_utils.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T09:27:48.8868387Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/mapping.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T09:27:48.8868885Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/mnnvl.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T09:27:48.8869382Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/nvshmem.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T09:27:48.8869929Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/nvshmem_allreduce.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T09:27:48.8870478Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/trtllm_alltoall.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T09:27:48.8870978Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/trtllm_ar.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T09:27:48.8871505Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/trtllm_mnnvl_ar.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T09:27:48.8872001Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/comm/vllm_ar.py -> build/bdist.linux-x86_64/wheel/./flashinfer/comm 2025-09-07T09:27:48.8872207Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/cudnn 2025-09-07T09:27:48.8872702Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/cudnn/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/cudnn 2025-09-07T09:27:48.8873242Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/cudnn/decode.py -> build/bdist.linux-x86_64/wheel/./flashinfer/cudnn 2025-09-07T09:27:48.8873778Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/cudnn/prefill.py -> build/bdist.linux-x86_64/wheel/./flashinfer/cudnn 2025-09-07T09:27:48.8874034Z #47 693.7 creating build/bdist.linux-x86_64/wheel/flashinfer/logits_processor 2025-09-07T09:27:48.8874658Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/__init__.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T09:27:48.8875306Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/compiler.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T09:27:48.8875942Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/fusion_rules.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T09:27:48.8876599Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/legalization.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T09:27:48.8877182Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/op.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T09:27:48.8877806Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/operators.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T09:27:48.8878483Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/pipeline.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T09:27:48.8879122Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/processors.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T09:27:48.8879729Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/types.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T09:27:48.8880376Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/logits_processor/validators.py -> build/bdist.linux-x86_64/wheel/./flashinfer/logits_processor 2025-09-07T09:27:48.8880818Z #47 693.7 copying build/lib.linux-x86_64-cpython-312/flashinfer/py.typed -> build/bdist.linux-x86_64/wheel/./flashinfer 2025-09-07T09:27:48.8880930Z #47 693.7 running install_egg_info 2025-09-07T09:27:48.8881046Z #47 693.7 running egg_info 2025-09-07T09:27:48.8881190Z #47 693.7 creating flashinfer_python.egg-info 2025-09-07T09:27:48.8881351Z #47 693.7 writing flashinfer_python.egg-info/PKG-INFO 2025-09-07T09:27:48.8881661Z #47 693.7 writing dependency_links to flashinfer_python.egg-info/dependency_links.txt 2025-09-07T09:27:48.8881900Z #47 693.7 writing requirements to flashinfer_python.egg-info/requires.txt 2025-09-07T09:27:48.8882149Z #47 693.7 writing top-level names to flashinfer_python.egg-info/top_level.txt 2025-09-07T09:27:48.8882397Z #47 693.7 writing manifest file 'flashinfer_python.egg-info/SOURCES.txt' 2025-09-07T09:27:48.8882634Z #47 693.8 reading manifest file 'flashinfer_python.egg-info/SOURCES.txt' 2025-09-07T09:27:49.0493959Z #47 693.8 adding license file 'LICENSE' 2025-09-07T09:27:49.0494418Z #47 693.8 adding license file 'licenses/LICENSE.cutlass.txt' 2025-09-07T09:27:49.0494972Z #47 693.8 adding license file 'licenses/LICENSE.flashattention3.txt' 2025-09-07T09:27:49.0495488Z #47 693.8 adding license file 'licenses/LICENSE.fmt.txt' 2025-09-07T09:27:49.0495977Z #47 693.8 adding license file 'licenses/LICENSE.spdlog.txt' 2025-09-07T09:27:49.0496556Z #47 693.8 writing manifest file 'flashinfer_python.egg-info/SOURCES.txt' 2025-09-07T09:27:49.0497431Z #47 693.8 Copying flashinfer_python.egg-info to build/bdist.linux-x86_64/wheel/./flashinfer_python-0.2.14.post1-py3.12.egg-info 2025-09-07T09:27:49.0498170Z #47 693.8 running install_scripts 2025-09-07T09:27:49.0498747Z #47 693.8 creating build/bdist.linux-x86_64/wheel/flashinfer_python-0.2.14.post1.dist-info/WHEEL 2025-09-07T09:27:49.0500104Z #47 693.8 creating '/workspace/wheels/flashinfer/.tmp-xl28m2ff/flashinfer_python-0.2.14.post1-cp39-abi3-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it 2025-09-07T09:27:49.0501134Z #47 693.8 adding 'flashinfer/__init__.py' 2025-09-07T09:27:49.0501500Z #47 693.8 adding 'flashinfer/__main__.py' 2025-09-07T09:27:49.0501880Z #47 693.8 adding 'flashinfer/_build_meta.py' 2025-09-07T09:27:49.0502254Z #47 693.8 adding 'flashinfer/activation.py' 2025-09-07T09:27:49.0502633Z #47 693.8 adding 'flashinfer/aot.py' 2025-09-07T09:27:49.0502992Z #47 693.8 adding 'flashinfer/artifacts.py' 2025-09-07T09:27:49.0503357Z #47 693.8 adding 'flashinfer/attention.py' 2025-09-07T09:27:49.0503809Z #47 693.8 adding 'flashinfer/autotuner.py' 2025-09-07T09:27:49.0504171Z #47 693.8 adding 'flashinfer/cascade.py' 2025-09-07T09:27:49.0504544Z #47 693.8 adding 'flashinfer/cuda_utils.py' 2025-09-07T09:27:49.0505004Z #47 693.8 adding 'flashinfer/decode.py' 2025-09-07T09:27:49.0505366Z #47 693.8 adding 'flashinfer/deep_gemm.py' 2025-09-07T09:27:49.0505749Z #47 693.8 adding 'flashinfer/fp4_quantization.py' 2025-09-07T09:27:49.0506170Z #47 693.8 adding 'flashinfer/fp8_quantization.py' 2025-09-07T09:27:49.0506654Z #47 693.8 adding 'flashinfer/gemm.py' 2025-09-07T09:27:49.0506980Z #47 693.8 adding 'flashinfer/green_ctx.py' 2025-09-07T09:27:49.0507321Z #47 693.8 adding 'flashinfer/mla.py' 2025-09-07T09:27:49.0507634Z #47 693.8 adding 'flashinfer/norm.py' 2025-09-07T09:27:49.0507968Z #47 693.8 adding 'flashinfer/page.py' 2025-09-07T09:27:49.0508324Z #47 693.8 adding 'flashinfer/pod.py' 2025-09-07T09:27:49.0508660Z #47 693.8 adding 'flashinfer/prefill.py' 2025-09-07T09:27:49.0508995Z #47 693.8 adding 'flashinfer/py.typed' 2025-09-07T09:27:49.0509352Z #47 693.8 adding 'flashinfer/quantization.py' 2025-09-07T09:27:49.0509703Z #47 693.8 adding 'flashinfer/rope.py' 2025-09-07T09:27:49.0510043Z #47 693.8 adding 'flashinfer/sampling.py' 2025-09-07T09:27:49.0510392Z #47 693.8 adding 'flashinfer/sparse.py' 2025-09-07T09:27:49.0510731Z #47 693.8 adding 'flashinfer/tllm_utils.py' 2025-09-07T09:27:49.0511080Z #47 693.8 adding 'flashinfer/utils.py' 2025-09-07T09:27:49.0511419Z #47 693.8 adding 'flashinfer/comm/__init__.py' 2025-09-07T09:27:49.0511798Z #47 693.8 adding 'flashinfer/comm/cuda_ipc.py' 2025-09-07T09:27:49.0512176Z #47 693.9 adding 'flashinfer/comm/dlpack_utils.py' 2025-09-07T09:27:49.0512570Z #47 693.9 adding 'flashinfer/comm/mapping.py' 2025-09-07T09:27:49.0512928Z #47 693.9 adding 'flashinfer/comm/mnnvl.py' 2025-09-07T09:27:49.0513300Z #47 693.9 adding 'flashinfer/comm/nvshmem.py' 2025-09-07T09:27:49.0513703Z #47 693.9 adding 'flashinfer/comm/nvshmem_allreduce.py' 2025-09-07T09:27:49.0514123Z #47 693.9 adding 'flashinfer/comm/trtllm_alltoall.py' 2025-09-07T09:27:49.0514525Z #47 693.9 adding 'flashinfer/comm/trtllm_ar.py' 2025-09-07T09:27:49.0514916Z #47 693.9 adding 'flashinfer/comm/trtllm_mnnvl_ar.py' 2025-09-07T09:27:49.0515311Z #47 693.9 adding 'flashinfer/comm/vllm_ar.py' 2025-09-07T09:27:49.0515677Z #47 693.9 adding 'flashinfer/cudnn/__init__.py' 2025-09-07T09:27:49.0516059Z #47 693.9 adding 'flashinfer/cudnn/decode.py' 2025-09-07T09:27:49.0516424Z #47 693.9 adding 'flashinfer/cudnn/prefill.py' 2025-09-07T09:27:49.0516843Z #47 693.9 adding 'flashinfer/cute_dsl/blockscaled_gemm.py' 2025-09-07T09:27:49.0517267Z #47 693.9 adding 'flashinfer/cute_dsl/utils.py' 2025-09-07T09:27:49.0517653Z #47 693.9 adding 'flashinfer/data/custom_backend.py' 2025-09-07T09:27:49.0518043Z #47 693.9 adding 'flashinfer/data/setup.py' 2025-09-07T09:27:49.0518399Z #47 693.9 adding 'flashinfer/data/version.txt' 2025-09-07T09:27:49.0519938Z #47 694.0 adding 'flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T09:27:49.1574859Z #47 694.1 adding 'flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T09:27:49.3807216Z #47 694.3 adding 'flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T09:27:49.6277074Z #47 694.5 adding 'flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T09:27:49.8559358Z #47 694.8 adding 'flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T09:27:50.0975622Z #47 695.0 adding 'flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T09:27:50.2057024Z #47 695.1 adding 'flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T09:27:50.3092818Z #47 695.2 adding 'flashinfer/data/aot/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T09:27:50.4773596Z #47 695.4 adding 'flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False.so' 2025-09-07T09:27:50.6535660Z #47 695.4 adding 'flashinfer/data/aot/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_attention_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90.so' 2025-09-07T09:27:50.6678616Z #47 695.6 adding 'flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False.so' 2025-09-07T09:27:50.8438366Z #47 695.6 adding 'flashinfer/data/aot/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90/batch_mla_attention_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_ckv_512_head_dim_kpe_64_profiler_False_sm90.so' 2025-09-07T09:27:51.8288691Z #47 696.7 adding 'flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_.so' 2025-09-07T09:27:51.9974648Z #47 696.9 adding 'flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90.so' 2025-09-07T09:27:53.1391511Z #47 698.1 adding 'flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_.so' 2025-09-07T09:27:53.3150239Z #47 698.2 adding 'flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90.so' 2025-09-07T09:27:54.4516776Z #47 699.4 adding 'flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False_.so' 2025-09-07T09:27:54.6194871Z #47 699.5 adding 'flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_False__sm90.so' 2025-09-07T09:27:55.7531236Z #47 700.7 adding 'flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True_.so' 2025-09-07T09:27:55.9286222Z #47 700.8 adding 'flashinfer/data/aot/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90/batch_prefill_with_attention_sink_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_use_swa_True__sm90.so' 2025-09-07T09:27:57.3563667Z #47 702.3 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T09:27:57.5531433Z #47 702.5 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so' 2025-09-07T09:27:57.8028958Z #47 702.6 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so' 2025-09-07T09:27:58.7695804Z #47 703.7 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T09:28:00.8343100Z #47 705.7 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T09:28:02.4098750Z #47 707.3 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T09:28:02.5490945Z #47 707.5 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so' 2025-09-07T09:28:02.7795593Z #47 707.5 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_e4m3_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so' 2025-09-07T09:28:04.4856062Z #47 709.4 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T09:28:05.9474205Z #47 710.9 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T09:28:07.4240869Z #47 712.3 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T09:28:07.6164654Z #47 712.5 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so' 2025-09-07T09:28:07.8647822Z #47 712.6 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_192_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90.so' 2025-09-07T09:28:08.8233734Z #47 713.7 adding 'flashinfer/data/aot/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/batch_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T09:28:09.0086384Z #47 713.8 adding 'flashinfer/data/aot/cascade/cascade.so' 2025-09-07T09:28:09.0086995Z #47 713.9 adding 'flashinfer/data/aot/fmha_cutlass_sm100a/fmha_cutlass_sm100a.so' 2025-09-07T09:28:09.1470147Z #47 713.9 adding 'flashinfer/data/aot/logging/logging.so' 2025-09-07T09:28:09.1470643Z #47 714.0 adding 'flashinfer/data/aot/mla/mla.so' 2025-09-07T09:28:09.1471069Z #47 714.1 adding 'flashinfer/data/aot/norm/norm.so' 2025-09-07T09:28:09.3366415Z #47 714.1 adding 'flashinfer/data/aot/page/page.so' 2025-09-07T09:28:09.3367199Z #47 714.1 adding 'flashinfer/data/aot/quantization/quantization.so' 2025-09-07T09:28:09.4853800Z #47 714.4 adding 'flashinfer/data/aot/rope/rope.so' 2025-09-07T09:28:10.5214428Z #47 715.4 adding 'flashinfer/data/aot/sampling/sampling.so' 2025-09-07T09:28:10.6799593Z #47 715.6 adding 'flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T09:28:10.7845379Z #47 715.7 adding 'flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T09:28:10.9917557Z #47 715.9 adding 'flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T09:28:11.2217378Z #47 716.1 adding 'flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T09:28:11.4318813Z #47 716.3 adding 'flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T09:28:11.6561868Z #47 716.6 adding 'flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T09:28:11.7611737Z #47 716.7 adding 'flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T09:28:12.0108425Z #47 716.8 adding 'flashinfer/data/aot/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False/single_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False.so' 2025-09-07T09:28:12.4921127Z #47 717.4 adding 'flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T09:28:12.9793978Z #47 717.9 adding 'flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T09:28:13.9352773Z #47 718.8 adding 'flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T09:28:14.6674443Z #47 719.6 adding 'flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_e4m3_dtype_o_bf16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T09:28:15.5478868Z #47 720.5 adding 'flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T09:28:16.2217124Z #47 721.1 adding 'flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_e4m3_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T09:28:16.8742904Z #47 721.8 adding 'flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T09:28:17.3594668Z #47 722.3 adding 'flashinfer/data/aot/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False/single_prefill_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False.so' 2025-09-07T09:28:17.4599697Z #47 722.3 adding 'flashinfer/data/aot/trtllm_utils/trtllm_utils.so' 2025-09-07T09:28:17.4600878Z #47 722.3 adding 'flashinfer/data/csrc/activation.cu' 2025-09-07T09:28:17.4601896Z #47 722.3 adding 'flashinfer/data/csrc/aot_extension_utils.h' 2025-09-07T09:28:17.4602951Z #47 722.3 adding 'flashinfer/data/csrc/batch_attention.cu' 2025-09-07T09:28:17.4603843Z #47 722.3 adding 'flashinfer/data/csrc/batch_attention_customize_config.jinja' 2025-09-07T09:28:17.4604469Z #47 722.3 adding 'flashinfer/data/csrc/batch_attention_jit_pybind.cu' 2025-09-07T09:28:17.4605199Z #47 722.3 adding 'flashinfer/data/csrc/batch_attention_paged_kernel_inst.jinja' 2025-09-07T09:28:17.4605747Z #47 722.3 adding 'flashinfer/data/csrc/batch_decode.cu' 2025-09-07T09:28:17.4606230Z #47 722.3 adding 'flashinfer/data/csrc/batch_decode_config.inc' 2025-09-07T09:28:17.4606786Z #47 722.3 adding 'flashinfer/data/csrc/batch_decode_customize_config.jinja' 2025-09-07T09:28:17.4607377Z #47 722.3 adding 'flashinfer/data/csrc/batch_decode_jit_pybind.cu' 2025-09-07T09:28:17.4607931Z #47 722.3 adding 'flashinfer/data/csrc/batch_decode_kernel_inst.jinja' 2025-09-07T09:28:17.4608491Z #47 722.3 adding 'flashinfer/data/csrc/batch_decode_mla_config.jinja' 2025-09-07T09:28:17.4609199Z #47 722.3 adding 'flashinfer/data/csrc/batch_decode_mla_cute_sm80.cu' 2025-09-07T09:28:17.4609727Z #47 722.3 adding 'flashinfer/data/csrc/batch_decode_mla_plan.cu' 2025-09-07T09:28:17.4610334Z #47 722.3 adding 'flashinfer/data/csrc/batch_decode_mla_pybind.cu' 2025-09-07T09:28:17.4610844Z #47 722.3 adding 'flashinfer/data/csrc/batch_decode_mla_run.cu' 2025-09-07T09:28:17.4611352Z #47 722.3 adding 'flashinfer/data/csrc/batch_mla_config.jinja' 2025-09-07T09:28:17.4611843Z #47 722.3 adding 'flashinfer/data/csrc/batch_mla_plan.cu' 2025-09-07T09:28:17.4612302Z #47 722.3 adding 'flashinfer/data/csrc/batch_mla_pybind.cu' 2025-09-07T09:28:17.4613035Z #47 722.3 adding 'flashinfer/data/csrc/batch_mla_run.cu' 2025-09-07T09:28:17.4613580Z #47 722.3 adding 'flashinfer/data/csrc/batch_mla_sm90_plan.cu' 2025-09-07T09:28:17.4614101Z #47 722.3 adding 'flashinfer/data/csrc/batch_mla_sm90_pybind.cu' 2025-09-07T09:28:17.4614600Z #47 722.3 adding 'flashinfer/data/csrc/batch_mla_sm90_run.cu' 2025-09-07T09:28:17.4615087Z #47 722.3 adding 'flashinfer/data/csrc/batch_prefill.cu' 2025-09-07T09:28:17.4615590Z #47 722.3 adding 'flashinfer/data/csrc/batch_prefill_config.inc' 2025-09-07T09:28:17.4616178Z #47 722.3 adding 'flashinfer/data/csrc/batch_prefill_customize_config.jinja' 2025-09-07T09:28:17.4616881Z #47 722.3 adding 'flashinfer/data/csrc/batch_prefill_fp8_paged_sm90_kernel_inst.jinja' 2025-09-07T09:28:17.4617622Z #47 722.3 adding 'flashinfer/data/csrc/batch_prefill_fp8_ragged_sm90_kernel_inst.jinja' 2025-09-07T09:28:17.4618404Z #47 722.3 adding 'flashinfer/data/csrc/batch_prefill_fp8_sm90.cu' 2025-09-07T09:28:17.4618958Z #47 722.3 adding 'flashinfer/data/csrc/batch_prefill_jit_pybind.cu' 2025-09-07T09:28:17.4619571Z #47 722.3 adding 'flashinfer/data/csrc/batch_prefill_paged_kernel_inst.jinja' 2025-09-07T09:28:17.4620249Z #47 722.3 adding 'flashinfer/data/csrc/batch_prefill_paged_sm90_kernel_inst.jinja' 2025-09-07T09:28:17.4620938Z #47 722.3 adding 'flashinfer/data/csrc/batch_prefill_ragged_kernel_inst.jinja' 2025-09-07T09:28:17.4621706Z #47 722.3 adding 'flashinfer/data/csrc/batch_prefill_ragged_sm90_kernel_inst.jinja' 2025-09-07T09:28:17.4622370Z #47 722.3 adding 'flashinfer/data/csrc/batch_prefill_sm90.cu' 2025-09-07T09:28:17.4622896Z #47 722.3 adding 'flashinfer/data/csrc/batch_prefill_sm90_config.inc' 2025-09-07T09:28:17.4623536Z #47 722.3 adding 'flashinfer/data/csrc/batch_prefill_sm90_customize_config.jinja' 2025-09-07T09:28:17.4624196Z #47 722.3 adding 'flashinfer/data/csrc/batch_prefill_sm90_jit_pybind.cu' 2025-09-07T09:28:17.4624875Z #47 722.3 adding 'flashinfer/data/csrc/blackwell_fmha_plan.cu' 2025-09-07T09:28:17.4625336Z #47 722.3 adding 'flashinfer/data/csrc/bmm_fp8.cu' 2025-09-07T09:28:17.4625743Z #47 722.3 adding 'flashinfer/data/csrc/cascade.cu' 2025-09-07T09:28:17.4626232Z #47 722.3 adding 'flashinfer/data/csrc/cudnn_sdpa_kernel_launcher.cu' 2025-09-07T09:28:17.4626738Z #47 722.3 adding 'flashinfer/data/csrc/cudnn_sdpa_utils.h' 2025-09-07T09:28:17.4627192Z #47 722.3 adding 'flashinfer/data/csrc/cutlass_mla.cu' 2025-09-07T09:28:17.4627679Z #47 722.3 adding 'flashinfer/data/csrc/flashinfer_cascade_ops.cu' 2025-09-07T09:28:17.4628271Z #47 722.3 adding 'flashinfer/data/csrc/flashinfer_gemm_ops.cu' 2025-09-07T09:28:17.4628793Z #47 722.3 adding 'flashinfer/data/csrc/flashinfer_gemm_sm90_ops.cu' 2025-09-07T09:28:17.4629295Z #47 722.3 adding 'flashinfer/data/csrc/flashinfer_mla_ops.cu' 2025-09-07T09:28:17.4629788Z #47 722.3 adding 'flashinfer/data/csrc/flashinfer_norm_ops.cu' 2025-09-07T09:28:17.4630254Z #47 722.3 adding 'flashinfer/data/csrc/flashinfer_ops.cu' 2025-09-07T09:28:17.4630737Z #47 722.3 adding 'flashinfer/data/csrc/flashinfer_ops_sm90.cu' 2025-09-07T09:28:17.4631238Z #47 722.3 adding 'flashinfer/data/csrc/flashinfer_page_ops.cu' 2025-09-07T09:28:17.4631770Z #47 722.3 adding 'flashinfer/data/csrc/flashinfer_quantization_ops.cu' 2025-09-07T09:28:17.4632375Z #47 722.3 adding 'flashinfer/data/csrc/flashinfer_rope_ops.cu' 2025-09-07T09:28:17.4632884Z #47 722.3 adding 'flashinfer/data/csrc/flashinfer_sampling_ops.cu' 2025-09-07T09:28:17.4633456Z #47 722.3 adding 'flashinfer/data/csrc/fmha_cutlass_sm100.cu' 2025-09-07T09:28:17.4634005Z #47 722.3 adding 'flashinfer/data/csrc/fmha_cutlass_sm100_pybind.cu' 2025-09-07T09:28:17.4634522Z #47 722.3 adding 'flashinfer/data/csrc/fp4_gemm_cutlass.cu' 2025-09-07T09:28:17.4635013Z #47 722.3 adding 'flashinfer/data/csrc/fp4_gemm_cutlass.jinja' 2025-09-07T09:28:17.4635493Z #47 722.3 adding 'flashinfer/data/csrc/fp8_gemm_cutlass.cu' 2025-09-07T09:28:17.4635984Z #47 722.3 adding 'flashinfer/data/csrc/fp8_gemm_cutlass.jinja' 2025-09-07T09:28:17.4636533Z #47 722.3 adding 'flashinfer/data/csrc/gemm_groupwise_sm100.cu' 2025-09-07T09:28:17.4637138Z #47 722.3 adding 'flashinfer/data/csrc/gemm_groupwise_sm100_kernel_inst.jinja' 2025-09-07T09:28:17.4637731Z #47 722.3 adding 'flashinfer/data/csrc/gemm_sm100_pybind.cu' 2025-09-07T09:28:17.4638188Z #47 722.3 adding 'flashinfer/data/csrc/group_gemm.cu' 2025-09-07T09:28:17.4638703Z #47 722.3 adding 'flashinfer/data/csrc/group_gemm_fp8_groupwise_sm100.cu' 2025-09-07T09:28:17.4639363Z #47 722.3 adding 'flashinfer/data/csrc/group_gemm_fp8_groupwise_sm100_kernel_inst.jinja' 2025-09-07T09:28:17.4640048Z #47 722.3 adding 'flashinfer/data/csrc/group_gemm_mxfp4_groupwise_sm100.cu' 2025-09-07T09:28:17.4640911Z #47 722.3 adding 'flashinfer/data/csrc/group_gemm_mxfp4_groupwise_sm100_kernel_inst.jinja' 2025-09-07T09:28:17.4641560Z #47 722.3 adding 'flashinfer/data/csrc/group_gemm_sm100_pybind.cu' 2025-09-07T09:28:17.4642048Z #47 722.3 adding 'flashinfer/data/csrc/group_gemm_sm90.cu' 2025-09-07T09:28:17.4642622Z #47 722.3 adding 'flashinfer/data/csrc/group_gemm_sm90_kernel_inst.jinja' 2025-09-07T09:28:17.4643136Z #47 722.3 adding 'flashinfer/data/csrc/logging.cc' 2025-09-07T09:28:17.4643534Z #47 722.3 adding 'flashinfer/data/csrc/norm.cu' 2025-09-07T09:28:17.4644100Z #47 722.3 adding 'flashinfer/data/csrc/nvshmem_binding.cu' 2025-09-07T09:28:17.4644684Z #47 722.3 adding 'flashinfer/data/csrc/page.cu' 2025-09-07T09:28:17.4645085Z #47 722.3 adding 'flashinfer/data/csrc/pod.cu' 2025-09-07T09:28:17.4645492Z #47 722.3 adding 'flashinfer/data/csrc/pod_config.inc' 2025-09-07T09:28:17.4645988Z #47 722.3 adding 'flashinfer/data/csrc/pod_customize_config.jinja' 2025-09-07T09:28:17.4646484Z #47 722.3 adding 'flashinfer/data/csrc/pod_jit_pybind.cu' 2025-09-07T09:28:17.4646963Z #47 722.3 adding 'flashinfer/data/csrc/pod_kernel_inst.jinja' 2025-09-07T09:28:17.4647479Z #47 722.3 adding 'flashinfer/data/csrc/pytorch_conversion_utils.h' 2025-09-07T09:28:17.4648001Z #47 722.3 adding 'flashinfer/data/csrc/pytorch_extension_utils.h' 2025-09-07T09:28:17.4648495Z #47 722.3 adding 'flashinfer/data/csrc/quantization.cu' 2025-09-07T09:28:17.4648914Z #47 722.3 adding 'flashinfer/data/csrc/renorm.cu' 2025-09-07T09:28:17.4649323Z #47 722.3 adding 'flashinfer/data/csrc/rope.cu' 2025-09-07T09:28:17.4649729Z #47 722.3 adding 'flashinfer/data/csrc/runtime_utils.h' 2025-09-07T09:28:17.4650160Z #47 722.3 adding 'flashinfer/data/csrc/sampling.cu' 2025-09-07T09:28:17.4650601Z #47 722.3 adding 'flashinfer/data/csrc/single_decode.cu' 2025-09-07T09:28:17.4651076Z #47 722.3 adding 'flashinfer/data/csrc/single_decode_config.inc' 2025-09-07T09:28:17.4651661Z #47 722.3 adding 'flashinfer/data/csrc/single_decode_customize_config.jinja' 2025-09-07T09:28:17.4652470Z #47 722.3 adding 'flashinfer/data/csrc/single_decode_jit_pybind.cu' 2025-09-07T09:28:17.4653236Z #47 722.3 adding 'flashinfer/data/csrc/single_decode_kernel_inst.jinja' 2025-09-07T09:28:17.4653771Z #47 722.3 adding 'flashinfer/data/csrc/single_prefill.cu' 2025-09-07T09:28:17.4654288Z #47 722.3 adding 'flashinfer/data/csrc/single_prefill_config.inc' 2025-09-07T09:28:17.4654888Z #47 722.3 adding 'flashinfer/data/csrc/single_prefill_customize_config.jinja' 2025-09-07T09:28:17.4655508Z #47 722.3 adding 'flashinfer/data/csrc/single_prefill_fp8_sm90.cu' 2025-09-07T09:28:17.4656136Z #47 722.3 adding 'flashinfer/data/csrc/single_prefill_fp8_sm90_kernel_inst.jinja' 2025-09-07T09:28:17.4656767Z #47 722.3 adding 'flashinfer/data/csrc/single_prefill_jit_pybind.cu' 2025-09-07T09:28:17.4657440Z #47 722.3 adding 'flashinfer/data/csrc/single_prefill_kernel_inst.jinja' 2025-09-07T09:28:17.4658144Z #47 722.3 adding 'flashinfer/data/csrc/single_prefill_sm90.cu' 2025-09-07T09:28:17.4658702Z #47 722.3 adding 'flashinfer/data/csrc/single_prefill_sm90_config.inc' 2025-09-07T09:28:17.4659360Z #47 722.3 adding 'flashinfer/data/csrc/single_prefill_sm90_customize_config.jinja' 2025-09-07T09:28:17.4660021Z #47 722.3 adding 'flashinfer/data/csrc/single_prefill_sm90_jit_pybind.cu' 2025-09-07T09:28:17.4660747Z #47 722.3 adding 'flashinfer/data/csrc/single_prefill_sm90_kernel_inst.jinja' 2025-09-07T09:28:17.4661313Z #47 722.3 adding 'flashinfer/data/csrc/trtllm_allreduce.cu' 2025-09-07T09:28:17.4661887Z #47 722.3 adding 'flashinfer/data/csrc/trtllm_allreduce_fusion.cu' 2025-09-07T09:28:17.4662399Z #47 722.3 adding 'flashinfer/data/csrc/trtllm_alltoall.cu' 2025-09-07T09:28:17.4662999Z #47 722.3 adding 'flashinfer/data/csrc/trtllm_batched_gemm_runner.cu' 2025-09-07T09:28:17.4663595Z #47 722.3 adding 'flashinfer/data/csrc/trtllm_fmha_kernel_launcher.cu' 2025-09-07T09:28:17.4664178Z #47 722.3 adding 'flashinfer/data/csrc/trtllm_fused_moe_dev_kernel.cu' 2025-09-07T09:28:17.4664904Z #47 722.3 adding 'flashinfer/data/csrc/trtllm_fused_moe_kernel_launcher.cu' 2025-09-07T09:28:17.4665519Z #47 722.3 adding 'flashinfer/data/csrc/trtllm_fused_moe_routing_deepseek.cu' 2025-09-07T09:28:17.4666239Z #47 722.3 adding 'flashinfer/data/csrc/trtllm_fused_moe_routing_llama4.cu' 2025-09-07T09:28:17.4666864Z #47 722.3 adding 'flashinfer/data/csrc/trtllm_fused_moe_routing_renormalize.cu' 2025-09-07T09:28:17.4667508Z #47 722.3 adding 'flashinfer/data/csrc/trtllm_fused_moe_runner.cu' 2025-09-07T09:28:17.4668023Z #47 722.3 adding 'flashinfer/data/csrc/trtllm_gemm_runner.cu' 2025-09-07T09:28:17.4668523Z #47 722.3 adding 'flashinfer/data/csrc/trtllm_mnnvl_allreduce.cu' 2025-09-07T09:28:17.4669149Z #47 722.3 adding 'flashinfer/data/csrc/trtllm_moe_allreduce_fusion.cu' 2025-09-07T09:28:17.4669687Z #47 722.3 adding 'flashinfer/data/csrc/vllm_custom_all_reduce.cu' 2025-09-07T09:28:17.4670384Z #47 722.3 adding 'flashinfer/data/csrc/fused_moe/cutlass_backend/cutlass_fused_moe_instantiation.cu' 2025-09-07T09:28:17.4671209Z #47 722.3 adding 'flashinfer/data/csrc/fused_moe/cutlass_backend/cutlass_fused_moe_kernels.cuh' 2025-09-07T09:28:17.4672071Z #47 722.3 adding 'flashinfer/data/csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_ops.cu' 2025-09-07T09:28:17.4672843Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/cpp/common/envUtils.cpp' 2025-09-07T09:28:17.4673448Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/cpp/common/logger.cpp' 2025-09-07T09:28:17.4674069Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/cpp/common/memoryUtils.cu' 2025-09-07T09:28:17.4674709Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/cpp/common/stringUtils.cpp' 2025-09-07T09:28:17.4675374Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/cpp/common/tllmException.cpp' 2025-09-07T09:28:17.4676039Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/cpp/kernels/quantization.cu' 2025-09-07T09:28:17.4676767Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/NvInferRuntime.h' 2025-09-07T09:28:17.4677612Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/assert.h' 2025-09-07T09:28:17.4678381Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/cudaBf16Wrapper.h' 2025-09-07T09:28:17.4679190Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/cudaFp8Utils.h' 2025-09-07T09:28:17.4679966Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/cudaUtils.h' 2025-09-07T09:28:17.4680713Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/dataType.h' 2025-09-07T09:28:17.4681456Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/logger.h' 2025-09-07T09:28:17.4682201Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/quantization.h' 2025-09-07T09:28:17.4683134Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/stringUtils.h' 2025-09-07T09:28:17.4683956Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/include/tensorrt_llm/common/tllmException.h' 2025-09-07T09:28:17.4684734Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/common/cublasMMWrapper.h' 2025-09-07T09:28:17.4685500Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/common/cudaBf16Fallbacks.cuh' 2025-09-07T09:28:17.4686265Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/common/cudaDriverWrapper.h' 2025-09-07T09:28:17.4687022Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/common/cudaTypeUtils.cuh' 2025-09-07T09:28:17.4687759Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/common/envUtils.h' 2025-09-07T09:28:17.4688461Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/common/memoryUtils.h' 2025-09-07T09:28:17.4689197Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/common/quantTypeUtils.cuh' 2025-09-07T09:28:17.4690069Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/common/reduceKernelUtils.cuh' 2025-09-07T09:28:17.4690823Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/common/workspace.h' 2025-09-07T09:28:17.4691753Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/compute_occupancy.h' 2025-09-07T09:28:17.4693458Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue_helpers.h' 2025-09-07T09:28:17.4694631Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm_configs.h' 2025-09-07T09:28:17.4695864Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/interleaved_numeric_conversion.h' 2025-09-07T09:28:17.4697136Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/system_barrier.h' 2025-09-07T09:28:17.4698343Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/tile_interleaved_layout.h' 2025-09-07T09:28:17.4699597Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/weight_only_quant_op.h' 2025-09-07T09:28:17.4700952Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/copy_red_global.hpp' 2025-09-07T09:28:17.4702213Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/copy_sm90_multimem.hpp' 2025-09-07T09:28:17.4703543Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/copy_traits_sm90_multimem.hpp' 2025-09-07T09:28:17.4705017Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/grid_dependency_control.h' 2025-09-07T09:28:17.4706244Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/arch/mma.h' 2025-09-07T09:28:17.4707612Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/communication/collective/sm90_allreduce_nvls_warpspecialized.hpp' 2025-09-07T09:28:17.4709113Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/detail/collective/mixed_input_utils.hpp' 2025-09-07T09:28:17.4710764Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/collective/epilogue_moe_finalize.hpp' 2025-09-07T09:28:17.4712316Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion/sm90_visitor_allreduce_tma_warpspecialized.hpp' 2025-09-07T09:28:17.4713954Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/thread/fused_activations.h' 2025-09-07T09:28:17.4715642Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_builder_gated.hpp' 2025-09-07T09:28:17.4717108Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_builder_interleaved.hpp' 2025-09-07T09:28:17.4718762Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_builder_mixed_input.hpp' 2025-09-07T09:28:17.4720361Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_mma_array_mixed_input.hpp' 2025-09-07T09:28:17.4721815Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_mma_gated.hpp' 2025-09-07T09:28:17.4723278Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/collective_mma_interleaved.hpp' 2025-09-07T09:28:17.4724869Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_array_tma_gmma_rs_warpspecialized_mixed_input_.hpp' 2025-09-07T09:28:17.4726801Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_gated_tma_gmma_ss_warpspecialized.hpp' 2025-09-07T09:28:17.4728429Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_gated_tma_gmma_ss_warpspecialized_fp8.hpp' 2025-09-07T09:28:17.5767042Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/sm90_mma_interleaved_tma_gmma_rs_warpspecialized_mixed_input.hpp' 2025-09-07T09:28:17.5768744Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders/sm90_gmma_builder_gated.inl' 2025-09-07T09:28:17.5770287Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders/sm90_gmma_builder_interleaved.inl' 2025-09-07T09:28:17.5771872Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/collective/builders/sm90_gmma_builder_mixed_input.inl' 2025-09-07T09:28:17.5773646Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/default_fpA_intB_traits.h' 2025-09-07T09:28:17.5775003Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fused_moe_kernel.cuh' 2025-09-07T09:28:17.5776388Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fused_moe_kernel_routine.cuh' 2025-09-07T09:28:17.5777804Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fused_moe_kernel_traits.cuh' 2025-09-07T09:28:17.5779185Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/gemm_moe_problem_visitor.h' 2025-09-07T09:28:17.5780594Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/gemm_universal_allreduce.hpp' 2025-09-07T09:28:17.5781964Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/mixed_gemm_B_layout.h' 2025-09-07T09:28:17.5783285Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_cute_util.cuh' 2025-09-07T09:28:17.5784846Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_cutlass_kernel.h' 2025-09-07T09:28:17.5786183Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/moe_problem_visitor.h' 2025-09-07T09:28:17.5787692Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/sm90_gemm_allreduce_tma_warpspecialized.hpp' 2025-09-07T09:28:17.5789235Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/sm90_gemm_allreduce_tma_warpspecialized_pingpong.hpp' 2025-09-07T09:28:17.5790718Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_dq_mma.h' 2025-09-07T09:28:17.5792410Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_dq_mma_multistage.h' 2025-09-07T09:28:17.5794061Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_dq_mma_pipelined.h' 2025-09-07T09:28:17.5804055Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_mma.h' 2025-09-07T09:28:17.5805676Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/default_mma_bf16.h' 2025-09-07T09:28:17.5807089Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_base.h' 2025-09-07T09:28:17.5808546Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_multistage.h' 2025-09-07T09:28:17.5809953Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_multistage_finegrained.h' 2025-09-07T09:28:17.5811405Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_multistage_percol.h' 2025-09-07T09:28:17.5813035Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_pipelined.h' 2025-09-07T09:28:17.5814490Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_pipelined_finegrained.h' 2025-09-07T09:28:17.5815973Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/threadblock/dq_mma_pipelined_percol.h' 2025-09-07T09:28:17.5817361Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp/default_mma_tensor_op.h' 2025-09-07T09:28:17.5818774Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp/mma_tensorop_compute_B_with_f16.h' 2025-09-07T09:28:17.5820194Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/warp/mma_tensorop_dequantizer.h' 2025-09-07T09:28:17.5821675Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/transform/threadblock/fine_grained_scale_zero_iterator.h' 2025-09-07T09:28:17.5823092Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/util/gather_tensor.hpp' 2025-09-07T09:28:17.5824081Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/delayStream.cu' 2025-09-07T09:28:17.5824937Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/delayStream.h' 2025-09-07T09:28:17.5825712Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/preQuantScaleKernel.cu' 2025-09-07T09:28:17.5826575Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/preQuantScaleKernel.h' 2025-09-07T09:28:17.5827401Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/quantization.cuh' 2025-09-07T09:28:17.5828129Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/quantization.h' 2025-09-07T09:28:17.5828975Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp' 2025-09-07T09:28:17.5829933Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.h' 2025-09-07T09:28:17.5830934Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_type_conversion.h' 2025-09-07T09:28:17.5832009Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm/fp8_blockscale_gemm.h' 2025-09-07T09:28:17.5833182Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fp8_blockscale_gemm/fp8_blockscale_gemm_stub.cu' 2025-09-07T09:28:17.5834376Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_fg_scalebias.cu' 2025-09-07T09:28:17.5835562Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_fg_scaleonly.cu' 2025-09-07T09:28:17.5836735Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_per_col.cu' 2025-09-07T09:28:17.5837895Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_fg_scalebias.cu' 2025-09-07T09:28:17.5839073Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_fg_scaleonly.cu' 2025-09-07T09:28:17.5840209Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_per_col.cu' 2025-09-07T09:28:17.5841428Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scalebias_bf16_out_bf16.cu' 2025-09-07T09:28:17.5842700Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scalebias_f16_out_f16.cu' 2025-09-07T09:28:17.5843985Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scaleonly_bf16_out_bf16.cu' 2025-09-07T09:28:17.5845266Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_fg_scaleonly_f16_out_f16.cu' 2025-09-07T09:28:17.5846503Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/e4m3_int4_gemm_per_col_f16_out_f16.cu' 2025-09-07T09:28:17.5847706Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_fg_scalebias.cu' 2025-09-07T09:28:17.5848880Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_fg_scaleonly.cu' 2025-09-07T09:28:17.5850015Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_per_col.cu' 2025-09-07T09:28:17.5851170Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_fg_scalebias.cu' 2025-09-07T09:28:17.5852329Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_fg_scaleonly.cu' 2025-09-07T09:28:17.5853757Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_per_col.cu' 2025-09-07T09:28:17.5854871Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm.h' 2025-09-07T09:28:17.5856014Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template.h' 2025-09-07T09:28:17.5857229Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/fpA_intB_gemm_template_sm90.h' 2025-09-07T09:28:17.5858447Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers/fpA_intB_launcher_sm90.h' 2025-09-07T09:28:17.5859726Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/fpA_intB_gemm/launchers/fpA_intB_launcher_sm90.inl' 2025-09-07T09:28:17.5860869Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/common.h' 2025-09-07T09:28:17.5861873Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/cutlass_kernel_selector.h' 2025-09-07T09:28:17.5862956Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/moe_gemm_kernels.h' 2025-09-07T09:28:17.5863967Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/moe_kernels.h' 2025-09-07T09:28:17.5865056Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/include/moe_util_kernels.h' 2025-09-07T09:28:17.5866118Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_bf16.cu' 2025-09-07T09:28:17.5867251Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_fp4.cu' 2025-09-07T09:28:17.5868456Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_fp8.cu' 2025-09-07T09:28:17.5869703Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_uint4.cu' 2025-09-07T09:28:17.5870815Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_bf16_uint8.cu' 2025-09-07T09:28:17.5871940Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_fp16.cu' 2025-09-07T09:28:17.5873036Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_fp4.cu' 2025-09-07T09:28:17.5874153Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_uint4.cu' 2025-09-07T09:28:17.5875273Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp16_uint8.cu' 2025-09-07T09:28:17.5876379Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp32_fp32.cu' 2025-09-07T09:28:17.5877487Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp4_fp4.cu' 2025-09-07T09:28:17.5878584Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp8_fp4.cu' 2025-09-07T09:28:17.5879669Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp8_fp8.cu' 2025-09-07T09:28:17.5880773Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_fp8_uint4.cu' 2025-09-07T09:28:17.5881880Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch.h' 2025-09-07T09:28:17.5883031Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch_tma_ws.h' 2025-09-07T09:28:17.5884268Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_template_dispatch_tma_ws_mixed_dtype.h' 2025-09-07T09:28:17.5885550Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_tma_warp_specialized_input.cu' 2025-09-07T09:28:17.5886768Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_tma_warp_specialized_traits.h' 2025-09-07T09:28:17.5887948Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/fused_moe_gemm_launcher_sm80.h' 2025-09-07T09:28:17.5889264Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/fused_moe_gemm_launcher_sm80.inl' 2025-09-07T09:28:17.5890474Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_launcher.h' 2025-09-07T09:28:17.5891626Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_launcher.inl' 2025-09-07T09:28:17.5893460Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_mixed_input_launcher.h' 2025-09-07T09:28:17.5894831Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/launchers/moe_gemm_tma_ws_mixed_input_launcher.inl' 2025-09-07T09:28:17.5895874Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora/lora.cpp' 2025-09-07T09:28:17.5896615Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/kernels/lora/lora.h' 2025-09-07T09:28:17.5897393Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/runtime/torchUtils.h' 2025-09-07T09:28:17.5898114Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp4Op.cpp' 2025-09-07T09:28:17.5898824Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp4Quantize.cpp' 2025-09-07T09:28:17.5899539Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp4Quantize.h' 2025-09-07T09:28:17.5900275Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp8Quantize.cpp' 2025-09-07T09:28:17.5900989Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/fp8Quantize.h' 2025-09-07T09:28:17.5901684Z #47 722.4 adding 'flashinfer/data/csrc/nv_internal/tensorrt_llm/thop/thUtils.h' 2025-09-07T09:28:17.5902299Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/config.hpp' 2025-09-07T09:28:17.5902868Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/int_tuple.hpp' 2025-09-07T09:28:17.5903439Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/layout.hpp' 2025-09-07T09:28:17.5904032Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/layout_composed.hpp' 2025-09-07T09:28:17.5904762Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/pointer.hpp' 2025-09-07T09:28:17.5905425Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/pointer_base.hpp' 2025-09-07T09:28:17.5906025Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/pointer_flagged.hpp' 2025-09-07T09:28:17.5906623Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/pointer_sparse.hpp' 2025-09-07T09:28:17.5907233Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/pointer_swizzle.hpp' 2025-09-07T09:28:17.5907804Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/stride.hpp' 2025-09-07T09:28:17.5908317Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/swizzle.hpp' 2025-09-07T09:28:17.5908879Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/swizzle_layout.hpp' 2025-09-07T09:28:17.5909433Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/tensor.hpp' 2025-09-07T09:28:17.5909980Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/tensor_impl.hpp' 2025-09-07T09:28:17.5910535Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/tensor_zip.hpp' 2025-09-07T09:28:17.5911101Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/underscore.hpp' 2025-09-07T09:28:17.5911692Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/algorithm/axpby.hpp' 2025-09-07T09:28:17.5912339Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/algorithm/clear.hpp' 2025-09-07T09:28:17.5913074Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/algorithm/cooperative_copy.hpp' 2025-09-07T09:28:17.5913804Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/algorithm/cooperative_gemm.hpp' 2025-09-07T09:28:17.5914483Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/algorithm/copy.hpp' 2025-09-07T09:28:17.5915088Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/algorithm/fill.hpp' 2025-09-07T09:28:17.5915715Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/algorithm/functional.hpp' 2025-09-07T09:28:17.5916352Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/algorithm/gemm.hpp' 2025-09-07T09:28:17.5916989Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/algorithm/prefer.hpp' 2025-09-07T09:28:17.5917628Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/algorithm/prefetch.hpp' 2025-09-07T09:28:17.5918311Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/algorithm/tensor_algorithms.hpp' 2025-09-07T09:28:17.5919043Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/algorithm/tensor_reduce.hpp' 2025-09-07T09:28:17.5919765Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/algorithm/tuple_algorithms.hpp' 2025-09-07T09:28:17.5920441Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/cluster_sm100.hpp' 2025-09-07T09:28:17.5921081Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/cluster_sm90.hpp' 2025-09-07T09:28:17.5921699Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/config.hpp' 2025-09-07T09:28:17.5922261Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/copy.hpp' 2025-09-07T09:28:17.5922833Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/copy_sm100.hpp' 2025-09-07T09:28:17.5923466Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/copy_sm100_tma.hpp' 2025-09-07T09:28:17.5924089Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/copy_sm50.hpp' 2025-09-07T09:28:17.5924680Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/copy_sm75.hpp' 2025-09-07T09:28:17.5925284Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/copy_sm80.hpp' 2025-09-07T09:28:17.5925865Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/copy_sm90.hpp' 2025-09-07T09:28:17.5926484Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/copy_sm90_desc.hpp' 2025-09-07T09:28:17.5927125Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/copy_sm90_tma.hpp' 2025-09-07T09:28:17.5927705Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/mma.hpp' 2025-09-07T09:28:17.5928269Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm100.hpp' 2025-09-07T09:28:17.5928877Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm100_desc.hpp' 2025-09-07T09:28:17.5929517Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm100_umma.hpp' 2025-09-07T09:28:17.5930121Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm120.hpp' 2025-09-07T09:28:17.5930754Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm120_sparse.hpp' 2025-09-07T09:28:17.5931387Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm61.hpp' 2025-09-07T09:28:17.5931961Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm70.hpp' 2025-09-07T09:28:17.5932629Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm75.hpp' 2025-09-07T09:28:17.5933390Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm80.hpp' 2025-09-07T09:28:17.5934015Z #47 722.4 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm89.hpp' 2025-09-07T09:28:17.5934623Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm90.hpp' 2025-09-07T09:28:17.5935271Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm90_desc.hpp' 2025-09-07T09:28:17.5935947Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm90_gmma.hpp' 2025-09-07T09:28:17.5936632Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm90_gmma_ext.hpp' 2025-09-07T09:28:17.6767431Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm90_gmma_sparse.hpp' 2025-09-07T09:28:17.6768441Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/arch/mma_sm90_gmma_sparse_ext.hpp' 2025-09-07T09:28:17.6769137Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/arch/simd_sm100.hpp' 2025-09-07T09:28:17.6769824Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/arch/tmem_allocator_sm100.hpp' 2025-09-07T09:28:17.6770471Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/arch/util.hpp' 2025-09-07T09:28:17.6771064Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/atom/copy_atom.hpp' 2025-09-07T09:28:17.6771741Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits.hpp' 2025-09-07T09:28:17.6772529Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm100.hpp' 2025-09-07T09:28:17.6773471Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm100_im2col.hpp' 2025-09-07T09:28:17.6774250Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm100_tma.hpp' 2025-09-07T09:28:17.6775001Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm50.hpp' 2025-09-07T09:28:17.6775707Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm75.hpp' 2025-09-07T09:28:17.6776416Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm80.hpp' 2025-09-07T09:28:17.6777182Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm90.hpp' 2025-09-07T09:28:17.6777924Z #47 722.5 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm90_im2col.hpp' 2025-09-07T09:28:17.6778703Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm90_tma.hpp' 2025-09-07T09:28:17.6779492Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/atom/copy_traits_sm90_tma_swizzle.hpp' 2025-09-07T09:28:17.6780234Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/atom/mma_atom.hpp' 2025-09-07T09:28:17.6780878Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits.hpp' 2025-09-07T09:28:17.6781551Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm100.hpp' 2025-09-07T09:28:17.6782264Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm120.hpp' 2025-09-07T09:28:17.6782996Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm120_sparse.hpp' 2025-09-07T09:28:17.6783739Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm61.hpp' 2025-09-07T09:28:17.6784575Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm70.hpp' 2025-09-07T09:28:17.6785249Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm75.hpp' 2025-09-07T09:28:17.6785921Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm80.hpp' 2025-09-07T09:28:17.6786580Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm89.hpp' 2025-09-07T09:28:17.6787262Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90.hpp' 2025-09-07T09:28:17.6787951Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90_gmma.hpp' 2025-09-07T09:28:17.6788701Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90_gmma_ext.hpp' 2025-09-07T09:28:17.6789477Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90_gmma_sparse.hpp' 2025-09-07T09:28:17.6790305Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/atom/mma_traits_sm90_gmma_sparse_ext.hpp' 2025-09-07T09:28:17.6791057Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/atom/partitioner.hpp' 2025-09-07T09:28:17.6791711Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/container/alignment.hpp' 2025-09-07T09:28:17.6792570Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/container/array.hpp' 2025-09-07T09:28:17.6793469Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/container/array_aligned.hpp' 2025-09-07T09:28:17.6794288Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/container/array_subbyte.hpp' 2025-09-07T09:28:17.6795055Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/container/bit_field.hpp' 2025-09-07T09:28:17.6795745Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/container/cuda_types.hpp' 2025-09-07T09:28:17.6796422Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/container/tuple.hpp' 2025-09-07T09:28:17.6797092Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/container/type_list.hpp' 2025-09-07T09:28:17.6797806Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/numeric/arithmetic_tuple.hpp' 2025-09-07T09:28:17.6798509Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/numeric/complex.hpp' 2025-09-07T09:28:17.6799161Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/numeric/int.hpp' 2025-09-07T09:28:17.6799837Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/numeric/integer_sequence.hpp' 2025-09-07T09:28:17.6800580Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/numeric/integral_constant.hpp' 2025-09-07T09:28:17.6801330Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/numeric/integral_ratio.hpp' 2025-09-07T09:28:17.6802005Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/numeric/math.hpp' 2025-09-07T09:28:17.6802677Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/numeric/numeric_types.hpp' 2025-09-07T09:28:17.6803331Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/numeric/real.hpp' 2025-09-07T09:28:17.6803976Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/util/debug.hpp' 2025-09-07T09:28:17.6804684Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/util/print.hpp' 2025-09-07T09:28:17.6805279Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/util/print_latex.hpp' 2025-09-07T09:28:17.6805911Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/util/print_svg.hpp' 2025-09-07T09:28:17.6806534Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/util/print_tensor.hpp' 2025-09-07T09:28:17.6807182Z #47 722.6 adding 'flashinfer/data/cutlass/include/cute/util/type_traits.hpp' 2025-09-07T09:28:17.6807805Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/aligned_buffer.h' 2025-09-07T09:28:17.6808395Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/array.h' 2025-09-07T09:28:17.6809016Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/array_planar_complex.h' 2025-09-07T09:28:17.7767855Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/array_subbyte.h' 2025-09-07T09:28:17.7768650Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/barrier.h' 2025-09-07T09:28:17.7769478Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/bfloat16.h' 2025-09-07T09:28:17.7770133Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/blas3.h' 2025-09-07T09:28:17.7770709Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/blas3_types.h' 2025-09-07T09:28:17.7771306Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/block_striped.h' 2025-09-07T09:28:17.7772065Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/cluster_launch.hpp' 2025-09-07T09:28:17.7772756Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/complex.h' 2025-09-07T09:28:17.7773519Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/constants.h' 2025-09-07T09:28:17.7774081Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/coord.h' 2025-09-07T09:28:17.7774642Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/core_io.h' 2025-09-07T09:28:17.7775367Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/cuda_host_adapter.hpp' 2025-09-07T09:28:17.7775993Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/cutlass.h' 2025-09-07T09:28:17.7776594Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/device_kernel.h' 2025-09-07T09:28:17.7777195Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/exmy_base.h' 2025-09-07T09:28:17.7777783Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/fast_math.h' 2025-09-07T09:28:17.7778545Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/float8.h' 2025-09-07T09:28:17.7779215Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/float_subbyte.h' 2025-09-07T09:28:17.7779944Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/floating_point_nvrtc.h' 2025-09-07T09:28:17.7780605Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/functional.h' 2025-09-07T09:28:17.7781196Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/gemm_coord.h' 2025-09-07T09:28:17.7781810Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/gemm_coord.hpp' 2025-09-07T09:28:17.7782402Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/half.h' 2025-09-07T09:28:17.7782986Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/integer_subbyte.h' 2025-09-07T09:28:17.7783759Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/kernel_hardware_info.h' 2025-09-07T09:28:17.7784651Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/kernel_hardware_info.hpp' 2025-09-07T09:28:17.7785325Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/kernel_launch.h' 2025-09-07T09:28:17.7785899Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/matrix.h' 2025-09-07T09:28:17.7786468Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/matrix_coord.h' 2025-09-07T09:28:17.7787072Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/matrix_shape.h' 2025-09-07T09:28:17.7787775Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/numeric_conversion.h' 2025-09-07T09:28:17.7788467Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/numeric_size.h' 2025-09-07T09:28:17.7789065Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/numeric_types.h' 2025-09-07T09:28:17.7789707Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/pitch_linear_coord.h' 2025-09-07T09:28:17.7790423Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/predicate_vector.h' 2025-09-07T09:28:17.7791043Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/quaternion.h' 2025-09-07T09:28:17.7791604Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/real.h' 2025-09-07T09:28:17.7792532Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/relatively_equal.h' 2025-09-07T09:28:17.7793173Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/semaphore.h' 2025-09-07T09:28:17.7793802Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/subbyte_reference.h' 2025-09-07T09:28:17.7794460Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/tensor_coord.h' 2025-09-07T09:28:17.7795184Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/tensor_ref.h' 2025-09-07T09:28:17.7795872Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/tensor_ref_planar_complex.h' 2025-09-07T09:28:17.7796575Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/tensor_view.h' 2025-09-07T09:28:17.7797259Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/tensor_view_planar_complex.h' 2025-09-07T09:28:17.7798021Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/tfloat32.h' 2025-09-07T09:28:17.7798581Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/trace.h' 2025-09-07T09:28:17.7799145Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/uint128.h' 2025-09-07T09:28:17.7799717Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/version.h' 2025-09-07T09:28:17.7800285Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/wmma_array.h' 2025-09-07T09:28:17.7800888Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/workspace.h' 2025-09-07T09:28:17.7801469Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/arch.h' 2025-09-07T09:28:17.7802076Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/barrier.h' 2025-09-07T09:28:17.7802735Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/cache_operation.h' 2025-09-07T09:28:17.7803398Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/config.h' 2025-09-07T09:28:17.7804107Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/grid_dependency_control.h' 2025-09-07T09:28:17.7805446Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/memory.h' 2025-09-07T09:28:17.7806122Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/memory_sm75.h' 2025-09-07T09:28:17.7806755Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/memory_sm80.h' 2025-09-07T09:28:17.7807362Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma.h' 2025-09-07T09:28:17.7807938Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sm100.h' 2025-09-07T09:28:17.7808558Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sm50.h' 2025-09-07T09:28:17.7809237Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sm60.h' 2025-09-07T09:28:17.7809882Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sm61.h' 2025-09-07T09:28:17.7810493Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sm70.h' 2025-09-07T09:28:17.7811088Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sm75.h' 2025-09-07T09:28:17.7811698Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sm80.h' 2025-09-07T09:28:17.7812298Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sm89.h' 2025-09-07T09:28:17.7813174Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sm90.h' 2025-09-07T09:28:17.7813845Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sparse_sm80.h' 2025-09-07T09:28:17.7814647Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/mma_sparse_sm89.h' 2025-09-07T09:28:17.7815414Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/reg_reconfig.h' 2025-09-07T09:28:17.7816040Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/simd.h' 2025-09-07T09:28:17.7816659Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/simd_sm60.h' 2025-09-07T09:28:17.7817293Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/simd_sm61.h' 2025-09-07T09:28:17.7817927Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/synclog.hpp' 2025-09-07T09:28:17.7818615Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/wmma.h' 2025-09-07T09:28:17.7819220Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/wmma_sm70.h' 2025-09-07T09:28:17.7819860Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/wmma_sm72.h' 2025-09-07T09:28:17.7820485Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/arch/wmma_sm75.h' 2025-09-07T09:28:17.7821283Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/conv2d_problem_size.h' 2025-09-07T09:28:17.7822035Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/conv3d_problem_size.h' 2025-09-07T09:28:17.7822799Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/convnd_problem_shape.hpp' 2025-09-07T09:28:17.7823525Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/convolution.h' 2025-09-07T09:28:17.7824160Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/detail.hpp' 2025-09-07T09:28:17.7824954Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/dispatch_policy.hpp' 2025-09-07T09:28:17.7825817Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/collective/collective_builder.hpp' 2025-09-07T09:28:17.7826671Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/collective/collective_conv.hpp' 2025-09-07T09:28:17.7827454Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/collective/detail.hpp' 2025-09-07T09:28:17.7828382Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/collective/sm100_implicit_gemm_umma_warpspecialized.hpp' 2025-09-07T09:28:17.7829530Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/collective/sm90_implicit_gemm_gmma_ss_warpspecialized.hpp' 2025-09-07T09:28:17.7830604Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/collective/builders/sm100_common.inl' 2025-09-07T09:28:17.7831546Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/collective/builders/sm100_umma_builder.inl' 2025-09-07T09:28:17.7832523Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/collective/builders/sm90_common.inl' 2025-09-07T09:28:17.7833473Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/collective/builders/sm90_gmma_builder.inl' 2025-09-07T09:28:17.7834393Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/device/conv_universal_adapter.hpp' 2025-09-07T09:28:17.7835216Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/device/direct_convolution.h' 2025-09-07T09:28:17.7836068Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/device/implicit_gemm_convolution.h' 2025-09-07T09:28:17.7837097Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/device/implicit_gemm_convolution_fusion.h' 2025-09-07T09:28:17.7837953Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/conv_universal.hpp' 2025-09-07T09:28:17.7838729Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d.h' 2025-09-07T09:28:17.7839513Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_dgrad.h' 2025-09-07T09:28:17.7840337Z #47 722.6 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop.h' 2025-09-07T09:28:17.7841197Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_fusion.h' 2025-09-07T09:28:17.7842330Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_with_absmax.h' 2025-09-07T09:28:17.7843337Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_with_broadcast.h' 2025-09-07T09:28:17.7844317Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop_with_reduction.h' 2025-09-07T09:28:17.7845250Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_group_fprop.h' 2025-09-07T09:28:17.7846105Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_wgrad.h' 2025-09-07T09:28:17.7846952Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv2d_wgrad_fusion.h' 2025-09-07T09:28:17.7848050Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_dgrad.h' 2025-09-07T09:28:17.7848907Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_fprop.h' 2025-09-07T09:28:17.7849763Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_fprop_fusion.h' 2025-09-07T09:28:17.7850704Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_fprop_with_broadcast.h' 2025-09-07T09:28:17.7851600Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_conv3d_wgrad.h' 2025-09-07T09:28:17.7852488Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_deconv2d.h' 2025-09-07T09:28:17.7853527Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_deconv2d_with_broadcast.h' 2025-09-07T09:28:17.7854417Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_deconv3d.h' 2025-09-07T09:28:17.7855302Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_deconv3d_with_broadcast.h' 2025-09-07T09:28:17.7856219Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/default_depthwise_fprop.h' 2025-09-07T09:28:17.7857069Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/direct_convolution.h' 2025-09-07T09:28:17.7857929Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution.h' 2025-09-07T09:28:17.7858877Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_fusion.h' 2025-09-07T09:28:17.7859899Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_strided_dgrad.h' 2025-09-07T09:28:17.7860944Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_with_absmax.h' 2025-09-07T09:28:17.7862100Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/implicit_gemm_convolution_with_fused_epilogue.h' 2025-09-07T09:28:17.7863245Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/sm100_implicit_gemm_tma_warpspecialized.hpp' 2025-09-07T09:28:17.7864428Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/kernel/sm90_implicit_gemm_tma_warpspecialized.hpp' 2025-09-07T09:28:17.7865427Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/thread/depthwise_mma.h' 2025-09-07T09:28:17.7866380Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_filter_tile_access_iterator_analytic.h' 2025-09-07T09:28:17.7867587Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_filter_tile_access_iterator_optimized.h' 2025-09-07T09:28:17.7868801Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_output_gradient_tile_access_iterator_analytic.h' 2025-09-07T09:28:17.7870246Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_dgrad_output_gradient_tile_access_iterator_optimized.h' 2025-09-07T09:28:17.7871720Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_analytic.h' 2025-09-07T09:28:17.7872999Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_few_channels.h' 2025-09-07T09:28:17.7874352Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_fixed_channels.h' 2025-09-07T09:28:17.7875641Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_activation_tile_access_iterator_optimized.h' 2025-09-07T09:28:17.7876879Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_analytic.h' 2025-09-07T09:28:17.7878123Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_few_channels.h' 2025-09-07T09:28:17.7879372Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_fixed_channels.h' 2025-09-07T09:28:17.7880623Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_fprop_filter_tile_access_iterator_optimized.h' 2025-09-07T09:28:17.7881859Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_params.h' 2025-09-07T09:28:17.7882698Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_tile_iterator.h' 2025-09-07T09:28:17.7883772Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_activation_tile_access_iterator_analytic.h' 2025-09-07T09:28:17.7885019Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_activation_tile_access_iterator_optimized.h' 2025-09-07T09:28:17.7886302Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_output_gradient_tile_access_iterator_analytic.h' 2025-09-07T09:28:17.7887614Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv2d_wgrad_output_gradient_tile_access_iterator_optimized.h' 2025-09-07T09:28:17.7888861Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_filter_tile_access_iterator_analytic.h' 2025-09-07T09:28:17.7890075Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_filter_tile_access_iterator_optimized.h' 2025-09-07T09:28:17.7891326Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_output_gradient_tile_access_iterator_analytic.h' 2025-09-07T09:28:17.7893051Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_dgrad_output_gradient_tile_access_iterator_optimized.h' 2025-09-07T09:28:17.7894464Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_activation_tile_access_iterator_analytic.h' 2025-09-07T09:28:17.7895792Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_activation_tile_access_iterator_optimized.h' 2025-09-07T09:28:17.7897070Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_filter_tile_access_iterator_analytic.h' 2025-09-07T09:28:17.7898323Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_fprop_filter_tile_access_iterator_optimized.h' 2025-09-07T09:28:17.7899351Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_params.h' 2025-09-07T09:28:17.7900685Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_activation_tile_access_iterator_analytic.h' 2025-09-07T09:28:17.7901980Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_activation_tile_access_iterator_optimized.h' 2025-09-07T09:28:17.7903300Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_output_gradient_tile_access_iterator_analytic.h' 2025-09-07T09:28:17.7904771Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/conv3d_wgrad_output_gradient_tile_access_iterator_optimized.h' 2025-09-07T09:28:17.7905903Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_direct_conv_params.h' 2025-09-07T09:28:17.7907224Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_activation_tile_access_iterator_direct_conv_fixed_stride_dilation.h' 2025-09-07T09:28:17.7908789Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_activation_tile_access_iterator_direct_conv_optimized.h' 2025-09-07T09:28:17.7910058Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_direct_conv_multistage.h' 2025-09-07T09:28:17.7911303Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_filter_tile_access_iterator_direct_conv_optimized.h' 2025-09-07T09:28:17.7912465Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_fprop_pipelined.h' 2025-09-07T09:28:17.7913369Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_mma_base.h' 2025-09-07T09:28:17.7914361Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/depthwise_mma_core_with_lane_access_size.h' 2025-09-07T09:28:17.7915631Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/implicit_gemm_fprop_fusion_multistage.h' 2025-09-07T09:28:17.7916695Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/implicit_gemm_multistage.h' 2025-09-07T09:28:17.7917610Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/implicit_gemm_pipelined.h' 2025-09-07T09:28:17.7918620Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/implicit_gemm_wgrad_fusion_multistage.h' 2025-09-07T09:28:17.7919746Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/predicated_scale_bias_vector_access_iterator.h' 2025-09-07T09:28:17.7920863Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/predicated_scale_bias_vector_iterator.h' 2025-09-07T09:28:17.7921838Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/threadblock/threadblock_swizzle.h' 2025-09-07T09:28:17.7922654Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/warp/mma_depthwise_simt.h' 2025-09-07T09:28:17.7923508Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/warp/mma_depthwise_simt_tile_iterator.h' 2025-09-07T09:28:17.7924391Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/conv/warp/scale_bias_relu_transform.h' 2025-09-07T09:28:17.7925210Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/detail/blockwise_scale_layout.hpp' 2025-09-07T09:28:17.8770837Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/detail/cluster.hpp' 2025-09-07T09:28:17.8771971Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/detail/collective.hpp' 2025-09-07T09:28:17.8772853Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/detail/dependent_false.hpp' 2025-09-07T09:28:17.8773794Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/detail/helper_macros.hpp' 2025-09-07T09:28:17.8774501Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/detail/layout.hpp' 2025-09-07T09:28:17.8775327Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/detail/mainloop_fusion_helper_scale_factor.hpp' 2025-09-07T09:28:17.8776206Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/detail/mma.hpp' 2025-09-07T09:28:17.8777007Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/detail/sm100_blockscaled_layout.hpp' 2025-09-07T09:28:17.8777839Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/detail/sm100_tmem_helper.hpp' 2025-09-07T09:28:17.8778704Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/detail/collective/mixed_input_utils.hpp' 2025-09-07T09:28:17.8779555Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/dispatch_policy.hpp' 2025-09-07T09:28:17.8780436Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/collective_builder.hpp' 2025-09-07T09:28:17.8781397Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/collective_epilogue.hpp' 2025-09-07T09:28:17.8782411Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/default_epilogue.hpp' 2025-09-07T09:28:17.8783389Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/default_epilogue_array.hpp' 2025-09-07T09:28:17.8784410Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/detail.hpp' 2025-09-07T09:28:17.8785320Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/epilogue_tensor_broadcast.hpp' 2025-09-07T09:28:17.8786332Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_array_nosmem.hpp' 2025-09-07T09:28:17.8787448Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_array_tma_warpspecialized.hpp' 2025-09-07T09:28:17.8788527Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_nosmem.hpp' 2025-09-07T09:28:17.8789557Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm100_epilogue_tma_warpspecialized.hpp' 2025-09-07T09:28:17.8790622Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm70_epilogue_vectorized.hpp' 2025-09-07T09:28:17.8791644Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm70_epilogue_vectorized_array.hpp' 2025-09-07T09:28:17.8793144Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm90_epilogue_array_tma_warpspecialized.hpp' 2025-09-07T09:28:17.8794333Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm90_epilogue_tma_warpspecialized.hpp' 2025-09-07T09:28:17.8795563Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/sm90_epilogue_tma_warpspecialized_bias_elementwise.hpp' 2025-09-07T09:28:17.8796731Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm100_builder.inl' 2025-09-07T09:28:17.8797716Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm120_builder.inl' 2025-09-07T09:28:17.8798705Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm120_common.inl' 2025-09-07T09:28:17.8799694Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm90_builder.inl' 2025-09-07T09:28:17.8800665Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/collective/builders/sm90_common.inl' 2025-09-07T09:28:17.8801623Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/callbacks.hpp' 2025-09-07T09:28:17.8802474Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/operations.hpp' 2025-09-07T09:28:17.8803442Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm100_callbacks_tma_warpspecialized.hpp' 2025-09-07T09:28:17.8804587Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm100_visitor_compute_tma_warpspecialized.hpp' 2025-09-07T09:28:17.8805820Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm100_visitor_store_tma_warpspecialized.hpp' 2025-09-07T09:28:17.8806984Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm120_callbacks_tma_warpspecialized.hpp' 2025-09-07T09:28:17.8808065Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm120_visitor_store_tma_warpspecialized.hpp' 2025-09-07T09:28:17.8809256Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_callbacks_tma_warpspecialized.hpp' 2025-09-07T09:28:17.8810330Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_compute_tma_warpspecialized.hpp' 2025-09-07T09:28:17.8811400Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_load_tma_warpspecialized.hpp' 2025-09-07T09:28:17.8812529Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_store_tma_warpspecialized.hpp' 2025-09-07T09:28:17.8813834Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_tma_warpspecialized.hpp' 2025-09-07T09:28:17.8814850Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/fusion/sm90_visitor_topk_softmax.hpp' 2025-09-07T09:28:17.8815737Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/activation.h' 2025-09-07T09:28:17.8816542Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/conversion_op.h' 2025-09-07T09:28:17.8817340Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/detail.hpp' 2025-09-07T09:28:17.8818156Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination.h' 2025-09-07T09:28:17.8819139Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_bias_elementwise.h' 2025-09-07T09:28:17.8820174Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_bias_relu.h' 2025-09-07T09:28:17.8821132Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_clamp.h' 2025-09-07T09:28:17.8822085Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_dgelu.h' 2025-09-07T09:28:17.8823022Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_drelu.h' 2025-09-07T09:28:17.8823962Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_gelu.h' 2025-09-07T09:28:17.8827924Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_generic.h' 2025-09-07T09:28:17.8828981Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_generic_with_scaling.h' 2025-09-07T09:28:17.8830017Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_hardswish.h' 2025-09-07T09:28:17.8830978Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_leaky_relu.h' 2025-09-07T09:28:17.8831939Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_params.h' 2025-09-07T09:28:17.8832919Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_planar_complex.h' 2025-09-07T09:28:17.8833871Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_relu.h' 2025-09-07T09:28:17.8834840Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_relu0.h' 2025-09-07T09:28:17.8835869Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_residual_block.h' 2025-09-07T09:28:17.8836845Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_sigmoid.h' 2025-09-07T09:28:17.8837771Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_silu.h' 2025-09-07T09:28:17.8838759Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_tensor_broadcast.hpp' 2025-09-07T09:28:17.8839798Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/linear_combination_with_elementwise.h' 2025-09-07T09:28:17.8840720Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/reduction_op.h' 2025-09-07T09:28:17.8841494Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/thread/scale_type.h' 2025-09-07T09:28:17.8842439Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_complex_tensor_op.h' 2025-09-07T09:28:17.8843584Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_complex_tensor_op_blas3.h' 2025-09-07T09:28:17.8844684Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_direct_store.h' 2025-09-07T09:28:17.8845740Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_planar_complex.h' 2025-09-07T09:28:17.8846769Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h' 2025-09-07T09:28:17.8847748Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensor_op.h' 2025-09-07T09:28:17.8848791Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensor_op_blas3.h' 2025-09-07T09:28:17.8849853Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_volta_tensor_op.h' 2025-09-07T09:28:17.8850909Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_with_absmax.h' 2025-09-07T09:28:17.8851959Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_with_broadcast.h' 2025-09-07T09:28:17.8853287Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_with_reduction.h' 2025-09-07T09:28:17.8854387Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_wmma_tensor_op.h' 2025-09-07T09:28:17.8855416Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_simt.h' 2025-09-07T09:28:17.8856446Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_tensor_op.h' 2025-09-07T09:28:17.8857550Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_volta_tensor_op.h' 2025-09-07T09:28:17.8858741Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/default_thread_map_wmma_tensor_op.h' 2025-09-07T09:28:17.8859847Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/direct_store_epilogue_iterator.h' 2025-09-07T09:28:17.8860790Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue.h' 2025-09-07T09:28:17.8861657Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_base.h' 2025-09-07T09:28:17.8862605Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_base_streamk.h' 2025-09-07T09:28:17.8863558Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_depthwise.h' 2025-09-07T09:28:17.8864521Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_direct_store.h' 2025-09-07T09:28:17.8865656Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_gemm_k_reduction.h' 2025-09-07T09:28:17.8866683Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_planar_complex.h' 2025-09-07T09:28:17.8867670Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_smem_accumulator.h' 2025-09-07T09:28:17.8868686Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_streamk_with_broadcast.h' 2025-09-07T09:28:17.8869736Z #47 722.7 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_visitor_with_softmax.h' 2025-09-07T09:28:17.8870714Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_absmax.h' 2025-09-07T09:28:17.8871673Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_broadcast.h' 2025-09-07T09:28:17.8872748Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_reduction.h' 2025-09-07T09:28:17.8873672Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_visitor.h' 2025-09-07T09:28:17.8874652Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_with_visitor_callbacks.h' 2025-09-07T09:28:17.8875598Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/epilogue_workspace.h' 2025-09-07T09:28:17.8876537Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/interleaved_epilogue.h' 2025-09-07T09:28:17.8877484Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/output_iterator_parameter.h' 2025-09-07T09:28:17.8878416Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/output_tile_thread_map.h' 2025-09-07T09:28:17.8879364Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator.h' 2025-09-07T09:28:17.8880517Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_affine.h' 2025-09-07T09:28:17.8881682Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_affine_layout_params.h' 2025-09-07T09:28:17.8882829Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_blas3.h' 2025-09-07T09:28:17.8883864Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_conv.h' 2025-09-07T09:28:17.8884958Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_direct_conv.h' 2025-09-07T09:28:17.8886045Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_params.h' 2025-09-07T09:28:17.8887144Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_predicates.h' 2025-09-07T09:28:17.8888288Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator_strided_dgrad.h' 2025-09-07T09:28:17.8889371Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator.h' 2025-09-07T09:28:17.8890346Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator_mixed.h' 2025-09-07T09:28:17.8891374Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator_pitch_linear.h' 2025-09-07T09:28:17.8892830Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_2x.hpp' 2025-09-07T09:28:17.8893826Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_compute.hpp' 2025-09-07T09:28:17.8894820Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_load.hpp' 2025-09-07T09:28:17.8895814Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitor_store.hpp' 2025-09-07T09:28:17.8896845Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/threadblock/fusion/visitors.hpp' 2025-09-07T09:28:17.8897889Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_complex_tensor_op.h' 2025-09-07T09:28:17.8899003Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_gaussian_complex_tensor_op.h' 2025-09-07T09:28:17.8900026Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_simt.h' 2025-09-07T09:28:17.8900962Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_tensor_op.h' 2025-09-07T09:28:17.8901948Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_volta_tensor_op.h' 2025-09-07T09:28:17.8902972Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/fragment_iterator_wmma_tensor_op.h' 2025-09-07T09:28:17.8903883Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/simt_policy.h' 2025-09-07T09:28:17.8904793Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/tensor_op_policy.h' 2025-09-07T09:28:17.8905617Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_simt.h' 2025-09-07T09:28:17.8906459Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_tensor_op.h' 2025-09-07T09:28:17.8907410Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_tensor_op_mixed.h' 2025-09-07T09:28:17.8908359Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_volta_tensor_op.h' 2025-09-07T09:28:17.8909289Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/tile_iterator_wmma_tensor_op.h' 2025-09-07T09:28:17.8910193Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/volta_tensor_op_policy.h' 2025-09-07T09:28:17.8911053Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/epilogue/warp/wmma_tensor_op_policy.h' 2025-09-07T09:28:17.8911956Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/experimental/distributed/device/detail.hpp' 2025-09-07T09:28:17.8913022Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/experimental/distributed/device/dist_gemm_universal_wrapper.hpp' 2025-09-07T09:28:17.8914108Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/experimental/distributed/device/full_barrier.hpp' 2025-09-07T09:28:17.8915076Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel/detail.hpp' 2025-09-07T09:28:17.8916107Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel/dist_gemm_kernel_wrapper.hpp' 2025-09-07T09:28:17.8917187Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/experimental/distributed/kernel/full_barrier.hpp' 2025-09-07T09:28:17.8918274Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules/dist_gemm_1d_schedules.hpp' 2025-09-07T09:28:17.8919488Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/experimental/distributed/schedules/dist_gemm_base_schedule.hpp' 2025-09-07T09:28:17.8920440Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/dispatch_policy.hpp' 2025-09-07T09:28:17.8921085Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/gemm.h' 2025-09-07T09:28:17.8921757Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/gemm_enumerated_types.h' 2025-09-07T09:28:17.8922561Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/group_array_problem_shape.hpp' 2025-09-07T09:28:17.8923406Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/collective_builder.hpp' 2025-09-07T09:28:17.8924313Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/collective_builder_decl.hpp' 2025-09-07T09:28:17.8925190Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/collective_mma.hpp' 2025-09-07T09:28:17.8926091Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/collective_mma_decl.hpp' 2025-09-07T09:28:17.8926991Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/fp8_accumulation.hpp' 2025-09-07T09:28:17.8928016Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp' 2025-09-07T09:28:17.8929150Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp' 2025-09-07T09:28:17.8930280Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_sparse_mma_warpspecialized.hpp' 2025-09-07T09:28:17.8931382Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized.hpp' 2025-09-07T09:28:17.8932597Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized_blockwise_scaling.hpp' 2025-09-07T09:28:17.8933966Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_array_warpspecialized_emulated.hpp' 2025-09-07T09:28:17.9774466Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_warpspecialized.hpp' 2025-09-07T09:28:17.9775593Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_warpspecialized_blockwise_scaling.hpp' 2025-09-07T09:28:17.9776754Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_mma_warpspecialized_emulated.hpp' 2025-09-07T09:28:17.9778019Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm100_sparse_mma_warpspecialized.hpp' 2025-09-07T09:28:17.9779093Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_blockscaled_mma_array_tma.hpp' 2025-09-07T09:28:17.9780090Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_blockscaled_mma_tma.hpp' 2025-09-07T09:28:17.9781111Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_blockscaled_sparse_mma_tma.hpp' 2025-09-07T09:28:17.9782194Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_mma_array_tma_blockwise_scaling.hpp' 2025-09-07T09:28:17.9783168Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_mma_tma.hpp' 2025-09-07T09:28:17.9784110Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_mma_tma_blockwise_scaling.hpp' 2025-09-07T09:28:17.9785182Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm120_sparse_mma_tma.hpp' 2025-09-07T09:28:17.9786061Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm70_mma_twostage.hpp' 2025-09-07T09:28:17.9786955Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm80_mma_array_multistage.hpp' 2025-09-07T09:28:17.9787868Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm80_mma_multistage.hpp' 2025-09-07T09:28:17.9789032Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_rs_warpspecialized_mixed_input.hpp' 2025-09-07T09:28:17.9790244Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized.hpp' 2025-09-07T09:28:17.9791414Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized_fp8.hpp' 2025-09-07T09:28:17.9793071Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized_fp8_blockwise_scaling.hpp' 2025-09-07T09:28:17.9794379Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_multistage_gmma_rs_warpspecialized.hpp' 2025-09-07T09:28:17.9795588Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_multistage_gmma_ss_warpspecialized.hpp' 2025-09-07T09:28:17.9796733Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_rs_warpspecialized.hpp' 2025-09-07T09:28:17.9798000Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_rs_warpspecialized_mixed_input.hpp' 2025-09-07T09:28:17.9799124Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss.hpp' 2025-09-07T09:28:17.9800142Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized.hpp' 2025-09-07T09:28:17.9801287Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized_fp8.hpp' 2025-09-07T09:28:17.9802528Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized_fp8_blockwise_scaling.hpp' 2025-09-07T09:28:17.9803803Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_sparse_mma_tma_gmma_ss_warpspecialized.hpp' 2025-09-07T09:28:17.9805140Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/sm90_sparse_mma_tma_gmma_ss_warpspecialized_fp8.hpp' 2025-09-07T09:28:17.9806268Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_9xBF16_umma_builder.inl' 2025-09-07T09:28:17.9807403Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_blockscaled_sparse_umma_builder.inl' 2025-09-07T09:28:17.9808555Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_blockscaled_umma_builder.inl' 2025-09-07T09:28:17.9809734Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_blockwise_umma_builder.inl' 2025-09-07T09:28:17.9810736Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_common.inl' 2025-09-07T09:28:17.9811698Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_pipeline_carveout.inl' 2025-09-07T09:28:17.9812950Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_simt_builder.inl' 2025-09-07T09:28:17.9813992Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_sparse_umma_builder.inl' 2025-09-07T09:28:17.9815045Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm100_umma_builder.inl' 2025-09-07T09:28:17.9816121Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_blockscaled_mma_builder.inl' 2025-09-07T09:28:17.9817294Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_blockscaled_sparse_mma_builder.inl' 2025-09-07T09:28:17.9818468Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_blockwise_mma_builder.inl' 2025-09-07T09:28:17.9819480Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_common.inl' 2025-09-07T09:28:17.9820438Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_mma_builder.inl' 2025-09-07T09:28:17.9821525Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm120_sparse_mma_builder.inl' 2025-09-07T09:28:17.9822523Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm1xx_common.inl' 2025-09-07T09:28:17.9823504Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm1xx_sparse_config.inl' 2025-09-07T09:28:17.9824468Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm90_common.inl' 2025-09-07T09:28:17.9825540Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm90_gmma_builder.inl' 2025-09-07T09:28:17.9826505Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm90_sparse_config.inl' 2025-09-07T09:28:17.9827506Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/collective/builders/sm90_sparse_gmma_builder.inl' 2025-09-07T09:28:17.9828434Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/base_grouped.h' 2025-09-07T09:28:17.9829245Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/default_gemm_configuration.h' 2025-09-07T09:28:17.9830071Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/ell_gemm.h' 2025-09-07T09:28:17.9830744Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm.h' 2025-09-07T09:28:17.9831408Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_array.h' 2025-09-07T09:28:17.9832137Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_batched.h' 2025-09-07T09:28:17.9832860Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_complex.h' 2025-09-07T09:28:17.9833594Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_grouped.h' 2025-09-07T09:28:17.9834421Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_layernorm_mainloop_fusion.h' 2025-09-07T09:28:17.9835250Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse.h' 2025-09-07T09:28:17.9836040Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse_universal.h' 2025-09-07T09:28:17.9836933Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse_universal_with_absmax.h' 2025-09-07T09:28:17.9837842Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse_with_absmax.h' 2025-09-07T09:28:17.9838722Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_sparse_with_visitor.h' 2025-09-07T09:28:17.9839573Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_splitk_parallel.h' 2025-09-07T09:28:17.9840368Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal.h' 2025-09-07T09:28:17.9841165Z #47 722.8 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h' 2025-09-07T09:28:17.9842000Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_base.h' 2025-09-07T09:28:17.9842900Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_streamk_with_broadcast.h' 2025-09-07T09:28:17.9843854Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_with_absmax.h' 2025-09-07T09:28:17.9844762Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_universal_with_broadcast.h' 2025-09-07T09:28:17.9845632Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemm_with_k_reduction.h' 2025-09-07T09:28:17.9846378Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/gemv.h' 2025-09-07T09:28:17.9847027Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/rank_2k.h' 2025-09-07T09:28:17.9847755Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/rank_2k_grouped.h' 2025-09-07T09:28:17.9848476Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/rank_k.h' 2025-09-07T09:28:17.9849125Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/symm.h' 2025-09-07T09:28:17.9849806Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/device/trmm.h' 2025-09-07T09:28:17.9850505Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_ell_gemm.h' 2025-09-07T09:28:17.9851261Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm.h' 2025-09-07T09:28:17.9852035Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_complex.h' 2025-09-07T09:28:17.9853105Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_grouped.h' 2025-09-07T09:28:17.9854048Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_grouped_per_group_scale.h' 2025-09-07T09:28:17.9855117Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_grouped_softmax_mainloop_fusion.h' 2025-09-07T09:28:17.9856258Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_layernorm_mainloop_fusion.h' 2025-09-07T09:28:17.9857335Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_planar_complex_universal.h' 2025-09-07T09:28:17.9858275Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse.h' 2025-09-07T09:28:17.9859171Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_universal.h' 2025-09-07T09:28:17.9860182Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_universal_with_absmax.h' 2025-09-07T09:28:17.9861211Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_with_absmax.h' 2025-09-07T09:28:17.9862173Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_sparse_with_visitor.h' 2025-09-07T09:28:17.9863129Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_splitk_parallel.h' 2025-09-07T09:28:17.9864114Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_streamk_with_broadcast.h' 2025-09-07T09:28:17.9865235Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_universal.h' 2025-09-07T09:28:17.9866301Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_universal_with_visitor.h' 2025-09-07T09:28:17.9867215Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_with_absmax.h' 2025-09-07T09:28:17.9868127Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_with_broadcast.h' 2025-09-07T09:28:17.9869041Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_with_k_reduction.h' 2025-09-07T09:28:17.9869935Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemm_with_reduction.h' 2025-09-07T09:28:17.9870749Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_gemv.h' 2025-09-07T09:28:17.9871497Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_2k.h' 2025-09-07T09:28:17.9872322Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_2k_complex.h' 2025-09-07T09:28:17.9873180Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_2k_grouped.h' 2025-09-07T09:28:17.9874037Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_2k_universal.h' 2025-09-07T09:28:17.9874860Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_k.h' 2025-09-07T09:28:17.9875658Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_k_complex.h' 2025-09-07T09:28:17.9876513Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_rank_k_universal.h' 2025-09-07T09:28:17.9877316Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_symm.h' 2025-09-07T09:28:17.9878083Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_symm_complex.h' 2025-09-07T09:28:17.9878965Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_symm_universal.h' 2025-09-07T09:28:17.9879846Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_trmm.h' 2025-09-07T09:28:17.9880601Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_trmm_complex.h' 2025-09-07T09:28:17.9881586Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/default_trmm_universal.h' 2025-09-07T09:28:17.9882361Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/ell_gemm.h' 2025-09-07T09:28:17.9883039Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm.h' 2025-09-07T09:28:17.9883754Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_array.h' 2025-09-07T09:28:17.9884479Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_batched.h' 2025-09-07T09:28:17.9885235Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_grouped.h' 2025-09-07T09:28:17.9886093Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_grouped_per_group_scale.h' 2025-09-07T09:28:17.9887006Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_grouped_problem_visitor.h' 2025-09-07T09:28:17.9887951Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_grouped_softmax_mainloop_fusion.h' 2025-09-07T09:28:17.9888929Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_layernorm_mainloop_fusion.h' 2025-09-07T09:28:17.9889750Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_params.h' 2025-09-07T09:28:17.9890503Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_pipelined.h' 2025-09-07T09:28:17.9891300Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_planar_complex.h' 2025-09-07T09:28:17.9892544Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_planar_complex_array.h' 2025-09-07T09:28:17.9893514Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_sparse_universal.h' 2025-09-07T09:28:17.9894444Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_sparse_universal_with_absmax.h' 2025-09-07T09:28:17.9895375Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_splitk_parallel.h' 2025-09-07T09:28:17.9896366Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_streamk_with_fused_epilogue.h' 2025-09-07T09:28:17.9897294Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_transpose_operands.h' 2025-09-07T09:28:17.9898136Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal.h' 2025-09-07T09:28:17.9898927Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal.hpp' 2025-09-07T09:28:17.9899774Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal_decl.h' 2025-09-07T09:28:17.9900643Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal_streamk.h' 2025-09-07T09:28:17.9901532Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal_with_visitor.h' 2025-09-07T09:28:17.9902508Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_universal_with_visitor_streamk.h' 2025-09-07T09:28:17.9903405Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_with_absmax.h' 2025-09-07T09:28:17.9904256Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_with_fused_epilogue.h' 2025-09-07T09:28:17.9905231Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemm_with_k_reduction.h' 2025-09-07T09:28:17.9905966Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemv.h' 2025-09-07T09:28:17.9906705Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/gemv_batched_strided.h' 2025-09-07T09:28:17.9907587Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/grouped_problem_visitor.h' 2025-09-07T09:28:18.2172063Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/params_sparse_base.h' 2025-09-07T09:28:18.2173228Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/params_universal_base.h' 2025-09-07T09:28:18.2174112Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_2k_grouped.h' 2025-09-07T09:28:18.2175010Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_2k_grouped_problem_visitor.h' 2025-09-07T09:28:18.2175969Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_2k_transpose_operands.h' 2025-09-07T09:28:18.2176832Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_2k_universal.h' 2025-09-07T09:28:18.2177649Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/rank_k_universal.h' 2025-09-07T09:28:18.2178735Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized.hpp' 2025-09-07T09:28:18.2179987Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized_input_transform.hpp' 2025-09-07T09:28:18.2181216Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_array_tma_warpspecialized_mma_transform.hpp' 2025-09-07T09:28:18.2182313Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp' 2025-09-07T09:28:18.2183398Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized_input_transform.hpp' 2025-09-07T09:28:18.2184546Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized_mma_transform.hpp' 2025-09-07T09:28:18.2185752Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_sparse_gemm_tma_warpspecialized.hpp' 2025-09-07T09:28:18.2186745Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_static_tile_scheduler.hpp' 2025-09-07T09:28:18.2187624Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_tile_scheduler.hpp' 2025-09-07T09:28:18.2188507Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_tile_scheduler_group.hpp' 2025-09-07T09:28:18.2189465Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm100_tile_scheduler_stream_k.hpp' 2025-09-07T09:28:18.2190563Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm120_gemm_tma_warpspecialized_cooperative_asymmetric_dma.hpp' 2025-09-07T09:28:18.2191555Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm70_gemm.hpp' 2025-09-07T09:28:18.2192499Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm70_gemm_array.hpp' 2025-09-07T09:28:18.2193690Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_array_tma_warpspecialized_cooperative.hpp' 2025-09-07T09:28:18.2194861Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_array_tma_warpspecialized_pingpong.hpp' 2025-09-07T09:28:18.2195841Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma.hpp' 2025-09-07T09:28:18.2196732Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized.hpp' 2025-09-07T09:28:18.2197770Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_cooperative.hpp' 2025-09-07T09:28:18.2198871Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_pingpong.hpp' 2025-09-07T09:28:18.2199862Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized.hpp' 2025-09-07T09:28:18.2200862Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized_cooperative.hpp' 2025-09-07T09:28:18.2201993Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized_pingpong.hpp' 2025-09-07T09:28:18.2202943Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_tile_scheduler.hpp' 2025-09-07T09:28:18.2203846Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_tile_scheduler_group.hpp' 2025-09-07T09:28:18.2204890Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sm90_tile_scheduler_stream_k.hpp' 2025-09-07T09:28:18.2205700Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sparse_gemm.h' 2025-09-07T09:28:18.2206471Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sparse_gemm_with_absmax.h' 2025-09-07T09:28:18.2207474Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/sparse_gemm_with_visitor.h' 2025-09-07T09:28:18.2208341Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/static_tile_scheduler.hpp' 2025-09-07T09:28:18.2209201Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/symm_universal.h' 2025-09-07T09:28:18.2210020Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/tile_scheduler.hpp' 2025-09-07T09:28:18.2210845Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/tile_scheduler_detail.hpp' 2025-09-07T09:28:18.2211687Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/tile_scheduler_params.h' 2025-09-07T09:28:18.2212586Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/kernel/trmm_universal.h' 2025-09-07T09:28:18.2213467Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/thread/mma.h' 2025-09-07T09:28:18.2214154Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/thread/mma_sm50.h' 2025-09-07T09:28:18.2214873Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/thread/mma_sm60.h' 2025-09-07T09:28:18.2215570Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/thread/mma_sm61.h' 2025-09-07T09:28:18.2216362Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_ell_mma.h' 2025-09-07T09:28:18.2217227Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_gemv_core.h' 2025-09-07T09:28:18.2218077Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma.h' 2025-09-07T09:28:18.2218908Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core.h' 2025-09-07T09:28:18.2219852Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_simt.h' 2025-09-07T09:28:18.2220779Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sm70.h' 2025-09-07T09:28:18.2221685Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sm75.h' 2025-09-07T09:28:18.2222604Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sm80.h' 2025-09-07T09:28:18.2223562Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_sparse_sm80.h' 2025-09-07T09:28:18.2224608Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_with_access_size.h' 2025-09-07T09:28:18.2225741Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_with_reduction.h' 2025-09-07T09:28:18.2226687Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_core_wmma.h' 2025-09-07T09:28:18.2227681Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_layernorm_mainloop_fusion.h' 2025-09-07T09:28:18.2228751Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_planar_complex_multistage.h' 2025-09-07T09:28:18.2229834Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_planar_complex_pipelined.h' 2025-09-07T09:28:18.2230899Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_softmax_mainloop_fusion.h' 2025-09-07T09:28:18.2231927Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_mma_with_reduction.h' 2025-09-07T09:28:18.2232904Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_multistage_mma_complex.h' 2025-09-07T09:28:18.2233915Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_multistage_mma_complex_core.h' 2025-09-07T09:28:18.2235003Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_multistage_mma_complex_core_sm80.h' 2025-09-07T09:28:18.2236063Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_multistage_trmm_complex.h' 2025-09-07T09:28:18.2237003Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_sparse_mma.h' 2025-09-07T09:28:18.2237836Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/default_trmm.h' 2025-09-07T09:28:18.2238688Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/ell_mma_multistage.h' 2025-09-07T09:28:18.2239585Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/ell_mma_pipelined.h' 2025-09-07T09:28:18.2240374Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/gemv.h' 2025-09-07T09:28:18.2241110Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/index_remat.h' 2025-09-07T09:28:18.2241884Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_base.h' 2025-09-07T09:28:18.2242690Z #47 722.9 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_blas3_multistage.h' 2025-09-07T09:28:18.2243691Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_layernorm_mainloop_fusion_multistage.h' 2025-09-07T09:28:18.2244657Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_multistage.h' 2025-09-07T09:28:18.2245461Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_pipelined.h' 2025-09-07T09:28:18.2246337Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_planar_complex_base.h' 2025-09-07T09:28:18.2247276Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_planar_complex_multistage.h' 2025-09-07T09:28:18.2248266Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_planar_complex_pipelined.h' 2025-09-07T09:28:18.2249206Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_singlestage.h' 2025-09-07T09:28:18.2250156Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_softmax_mainloop_fusion_multistage.h' 2025-09-07T09:28:18.2251116Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_sparse_base.h' 2025-09-07T09:28:18.2251965Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_sparse_multistage.h' 2025-09-07T09:28:18.2253170Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/mma_with_reduction_multistage.h' 2025-09-07T09:28:18.2254140Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/threadblock_swizzle.h' 2025-09-07T09:28:18.2255078Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/threadblock/threadblock_swizzle_streamk.h' 2025-09-07T09:28:18.2256044Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_complex_tensor_op.h' 2025-09-07T09:28:18.2256957Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_sparse_tensor_op.h' 2025-09-07T09:28:18.2257843Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_tensor_op.h' 2025-09-07T09:28:18.2258706Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_tensor_op_sm80.h' 2025-09-07T09:28:18.2259805Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_with_reduction_tensor_op.h' 2025-09-07T09:28:18.2260811Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/default_mma_wmma_tensor_op.h' 2025-09-07T09:28:18.2261719Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/layernorm_scale_bias_transform.h' 2025-09-07T09:28:18.2262509Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma.h' 2025-09-07T09:28:18.2263244Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_complex_tensor_op.h' 2025-09-07T09:28:18.2264114Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_complex_tensor_op_fast_f32.h' 2025-09-07T09:28:18.2265208Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_complex_tensor_op_tile_iterator_sm80.h' 2025-09-07T09:28:18.2266156Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_gaussian_complex_tensor_op.h' 2025-09-07T09:28:18.2267163Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_gaussian_complex_tensor_op_tile_iterator_sm80.h' 2025-09-07T09:28:18.2268209Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_mixed_input_tensor_op.h' 2025-09-07T09:28:18.2269016Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_planar_complex.h' 2025-09-07T09:28:18.2269747Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_simt.h' 2025-09-07T09:28:18.2270447Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_simt_policy.h' 2025-09-07T09:28:18.2271232Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_simt_tile_iterator.h' 2025-09-07T09:28:18.2272030Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_sparse_tensor_op.h' 2025-09-07T09:28:18.2272793Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op.h' 2025-09-07T09:28:18.2273558Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_fast_f32.h' 2025-09-07T09:28:18.2274416Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_fragment_iterator.h' 2025-09-07T09:28:18.2275277Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_policy.h' 2025-09-07T09:28:18.2276052Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_sm70.h' 2025-09-07T09:28:18.2276910Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_access_iterator.h' 2025-09-07T09:28:18.2277847Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator.h' 2025-09-07T09:28:18.2278732Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sm70.h' 2025-09-07T09:28:18.2279659Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sm80.h' 2025-09-07T09:28:18.2280578Z #47 723.0 adding 'flashinfer/data/cutlass/include/cutlass/gemm/warp/mma_tensor_op_tile_iterator_sparse.h' 2025-09-07T09:28:18.2281227Z #47 723.0 adding 'flashinfer 2025-09-07T09:28:18.2281573Z #47 723.0 [output clipped, log limit 2MiB reached] 2025-09-07T09:28:23.0795258Z #47 DONE 728.0s 2025-09-07T09:28:23.2328296Z 2025-09-07T09:28:23.2329134Z #48 [vllm-base 16/18] RUN --mount=type=cache,target=/root/.cache/uv uv pip install --system wheels/flashinfer/*.whl --verbose 2025-09-07T09:28:23.5716817Z #48 0.490 DEBUG uv 0.8.4 2025-09-07T09:28:23.7455556Z #48 0.490 DEBUG Searching for default Python interpreter in managed installations or search path 2025-09-07T09:28:23.7456419Z #48 0.490 DEBUG Searching for managed installations at `/root/.local/share/uv/python` 2025-09-07T09:28:23.7457367Z #48 0.492 DEBUG Found `cpython-3.12.11-linux-x86_64-gnu` at `/opt/python/cp312-cp312/bin/python` (first executable in the search path) 2025-09-07T09:28:23.7458291Z #48 0.492 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T09:28:23.7458829Z #48 0.493 DEBUG Acquired lock for `/opt/python/cp312-cp312` 2025-09-07T09:28:23.7469353Z #48 0.498 DEBUG At least one requirement is not satisfied: file:///workspace/wheels/flashinfer/flashinfer_python-0.2.14.post1-cp39-abi3-linux_x86_64.whl 2025-09-07T09:28:23.7470307Z #48 0.499 DEBUG Using request timeout of 500s 2025-09-07T09:28:23.7470772Z #48 0.504 DEBUG Solving with installed Python version: 3.12.11 2025-09-07T09:28:23.7471299Z #48 0.504 DEBUG Solving with target Python version: >=3.12.11 2025-09-07T09:28:23.7471803Z #48 0.504 DEBUG Adding direct dependency: flashinfer-python* 2025-09-07T09:28:23.7472865Z #48 0.505 DEBUG Searching for a compatible version of flashinfer-python @ file:///workspace/wheels/flashinfer/flashinfer_python-0.2.14.post1-cp39-abi3-linux_x86_64.whl (*) 2025-09-07T09:28:23.7474039Z #48 0.505 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: numpy* 2025-09-07T09:28:23.7474788Z #48 0.505 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: torch* 2025-09-07T09:28:23.7475623Z #48 0.505 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: ninja* 2025-09-07T09:28:23.7476448Z #48 0.505 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: requests* 2025-09-07T09:28:23.7477661Z #48 0.505 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: cuda-python<=12.9+ 2025-09-07T09:28:23.7478958Z #48 0.505 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: pynvml* 2025-09-07T09:28:23.7479716Z #48 0.505 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: einops* 2025-09-07T09:28:23.7480659Z #48 0.505 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: packaging>=24.2 2025-09-07T09:28:23.7482226Z #48 0.505 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: nvidia-cudnn-frontend>=1.13.0 2025-09-07T09:28:23.7483533Z #48 0.505 DEBUG Adding transitive dependency for flashinfer-python==0.2.14.post1: torch>=2.9.dev0, <2.10.dev0 2025-09-07T09:28:23.7484393Z #48 0.506 DEBUG No cache entry for: https://pypi.org/simple/nvidia-cudnn-frontend/ 2025-09-07T09:28:23.7485044Z #48 0.506 DEBUG Found stale response for: https://pypi.org/simple/ninja/ 2025-09-07T09:28:23.7485716Z #48 0.506 DEBUG Sending revalidation request for: https://pypi.org/simple/ninja/ 2025-09-07T09:28:23.7486339Z #48 0.506 DEBUG No cache entry for: https://pypi.org/simple/pynvml/ 2025-09-07T09:28:23.7486938Z #48 0.506 DEBUG Found stale response for: https://pypi.org/simple/requests/ 2025-09-07T09:28:23.7487722Z #48 0.506 DEBUG Sending revalidation request for: https://pypi.org/simple/requests/ 2025-09-07T09:28:23.7488406Z #48 0.506 DEBUG Found stale response for: https://pypi.org/simple/torch/ 2025-09-07T09:28:23.7489067Z #48 0.506 DEBUG Sending revalidation request for: https://pypi.org/simple/torch/ 2025-09-07T09:28:23.7489716Z #48 0.506 DEBUG Found stale response for: https://pypi.org/simple/einops/ 2025-09-07T09:28:23.7490382Z #48 0.506 DEBUG Sending revalidation request for: https://pypi.org/simple/einops/ 2025-09-07T09:28:23.7491058Z #48 0.506 DEBUG Found stale response for: https://pypi.org/simple/packaging/ 2025-09-07T09:28:23.7491791Z #48 0.506 DEBUG Sending revalidation request for: https://pypi.org/simple/packaging/ 2025-09-07T09:28:23.7493345Z #48 0.507 DEBUG No cache entry for: https://pypi.org/simple/cuda-python/ 2025-09-07T09:28:23.7494165Z #48 0.508 DEBUG Found stale response for: https://pypi.org/simple/numpy/ 2025-09-07T09:28:23.7494823Z #48 0.508 DEBUG Sending revalidation request for: https://pypi.org/simple/numpy/ 2025-09-07T09:28:23.7495510Z #48 0.516 DEBUG Found not-modified response for: https://pypi.org/simple/ninja/ 2025-09-07T09:28:23.7496200Z #48 0.516 DEBUG Found not-modified response for: https://pypi.org/simple/torch/ 2025-09-07T09:28:23.7496874Z #48 0.518 DEBUG Found not-modified response for: https://pypi.org/simple/einops/ 2025-09-07T09:28:23.7497586Z #48 0.518 DEBUG Found not-modified response for: https://pypi.org/simple/packaging/ 2025-09-07T09:28:23.7498313Z #48 0.518 DEBUG Found not-modified response for: https://pypi.org/simple/requests/ 2025-09-07T09:28:23.7499110Z #48 0.518 DEBUG Found not-modified response for: https://pypi.org/simple/numpy/ 2025-09-07T09:28:23.7499902Z #48 0.523 DEBUG Found installed version of ninja==1.13.0 that satisfies * 2025-09-07T09:28:23.7501509Z #48 0.523 DEBUG Found installed version of torch==2.9.0.dev20250906+cu128 (from file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) that satisfies >=2.9.dev0, <2.10.dev0 2025-09-07T09:28:23.7502791Z #48 0.524 DEBUG Searching for a compatible version of numpy (*) 2025-09-07T09:28:23.7503866Z #48 0.524 DEBUG Found installed version of numpy==2.2.6 that satisfies * 2025-09-07T09:28:23.7504421Z #48 0.524 DEBUG Selecting: numpy==2.2.6 [installed] (installed) 2025-09-07T09:28:23.7505043Z #48 0.524 DEBUG Found installed version of einops==0.8.1 that satisfies * 2025-09-07T09:28:23.7505669Z #48 0.524 DEBUG Found installed version of packaging==25.0 that satisfies >=24.2 2025-09-07T09:28:23.7506649Z #48 0.524 DEBUG Found installed version of requests==2.32.5 that satisfies * 2025-09-07T09:28:23.7507326Z #48 0.524 DEBUG Found installed version of numpy==2.2.6 that satisfies * 2025-09-07T09:28:23.7508264Z #48 0.525 DEBUG Searching for a compatible version of torch (>=2.9.dev0, <2.10.dev0) 2025-09-07T09:28:23.7509468Z #48 0.525 DEBUG Found installed version of torch==2.9.0.dev20250906+cu128 (from file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) that satisfies >=2.9.dev0, <2.10.dev0 2025-09-07T09:28:23.7510636Z #48 0.525 DEBUG Selecting: torch==2.9.0.dev20250906+cu128 [installed] (installed) 2025-09-07T09:28:23.7511341Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: filelock* 2025-09-07T09:28:23.7512151Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: typing-extensions>=4.10.0 2025-09-07T09:28:23.7513122Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: setuptools{python_full_version >= '3.12'}* 2025-09-07T09:28:23.7514044Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: sympy>=1.13.3 2025-09-07T09:28:23.7514836Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: networkx>=2.5.1 2025-09-07T09:28:23.7515707Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: jinja2* 2025-09-07T09:28:23.7516430Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: fsspec>=0.8.5 2025-09-07T09:28:23.7517808Z #48 0.525 DEBUG No cache entry for: https://files.pythonhosted.org/packages/d7/4a/cac76c174bb439a0c46c9a4413fcbea5c6cabfb01879f7bbdb9fdfaed76c/pynvml-13.0.1-py3-none-any.whl.metadata 2025-09-07T09:28:23.7519473Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.93, <12.8.93+ 2025-09-07T09:28:23.7520964Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.90, <12.8.90+ 2025-09-07T09:28:23.7522468Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.90, <12.8.90+ 2025-09-07T09:28:23.7523942Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=9.10.2.21, <9.10.2.21+ 2025-09-07T09:28:23.7525375Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.4.1, <12.8.4.1+ 2025-09-07T09:28:23.7526825Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=11.3.3.83, <11.3.3.83+ 2025-09-07T09:28:23.7528321Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=10.3.9.90, <10.3.9.90+ 2025-09-07T09:28:23.7529780Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=11.7.3.90, <11.7.3.90+ 2025-09-07T09:28:23.7531262Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.5.8.93, <12.5.8.93+ 2025-09-07T09:28:23.7533019Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=0.7.1, <0.7.1+ 2025-09-07T09:28:23.7534488Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=2.27.5, <2.27.5+ 2025-09-07T09:28:23.7536454Z #48 0.525 DEBUG No cache entry for: https://files.pythonhosted.org/packages/b7/b8/5f812452c653447b4c09fec3cf0c5192abab1ce18358fcfab16a70113cfa/nvidia_cudnn_frontend-1.14.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T09:28:23.7538972Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=3.3.20, <3.3.20+ 2025-09-07T09:28:23.7540437Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.90, <12.8.90+ 2025-09-07T09:28:23.7542119Z #48 0.525 DEBUG No cache entry for: https://files.pythonhosted.org/packages/24/3c/4475aebeaab9651f2e61000fbe76f91a476d371dbfbf0a1cf46e689af253/cuda_python-12.9.0-py3-none-any.whl.metadata 2025-09-07T09:28:23.7543837Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=12.8.93, <12.8.93+ 2025-09-07T09:28:23.7545438Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}>=1.13.1.3, <1.13.1.3+ 2025-09-07T09:28:23.7546863Z #48 0.525 DEBUG Adding transitive dependency for torch==2.9.0.dev20250906+cu128: pytorch-triton{platform_machine == 'x86_64' and sys_platform == 'linux'}==3.4.0+gitf7888497 2025-09-07T09:28:23.7547886Z #48 0.526 DEBUG Found stale response for: https://pypi.org/simple/filelock/ 2025-09-07T09:28:23.7548594Z #48 0.526 DEBUG Sending revalidation request for: https://pypi.org/simple/filelock/ 2025-09-07T09:28:23.7549313Z #48 0.526 DEBUG Found stale response for: https://pypi.org/simple/typing-extensions/ 2025-09-07T09:28:23.7550056Z #48 0.526 DEBUG Sending revalidation request for: https://pypi.org/simple/typing-extensions/ 2025-09-07T09:28:23.7550759Z #48 0.526 DEBUG Found stale response for: https://pypi.org/simple/sympy/ 2025-09-07T09:28:23.7551407Z #48 0.526 DEBUG Sending revalidation request for: https://pypi.org/simple/sympy/ 2025-09-07T09:28:23.7552053Z #48 0.526 DEBUG Found stale response for: https://pypi.org/simple/networkx/ 2025-09-07T09:28:23.7552726Z #48 0.526 DEBUG Sending revalidation request for: https://pypi.org/simple/networkx/ 2025-09-07T09:28:23.7553373Z #48 0.526 DEBUG Found stale response for: https://pypi.org/simple/jinja2/ 2025-09-07T09:28:23.7554022Z #48 0.526 DEBUG Sending revalidation request for: https://pypi.org/simple/jinja2/ 2025-09-07T09:28:23.7554661Z #48 0.526 DEBUG Found stale response for: https://pypi.org/simple/fsspec/ 2025-09-07T09:28:23.7555311Z #48 0.526 DEBUG Sending revalidation request for: https://pypi.org/simple/fsspec/ 2025-09-07T09:28:23.7556030Z #48 0.526 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T09:28:23.7556814Z #48 0.526 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T09:28:23.7557622Z #48 0.526 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T09:28:23.7558464Z #48 0.526 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T09:28:23.7559270Z #48 0.526 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T09:28:23.7560248Z #48 0.526 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T09:28:23.7561031Z #48 0.526 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T09:28:23.7561800Z #48 0.526 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T09:28:23.7562559Z #48 0.526 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T09:28:23.7563340Z #48 0.526 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T09:28:23.7564118Z #48 0.526 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T09:28:23.7564917Z #48 0.526 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T09:28:23.7565742Z #48 0.526 DEBUG Found stale response for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T09:28:23.7566505Z #48 0.526 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T09:28:23.7567291Z #48 0.526 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T09:28:23.7568076Z #48 0.526 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T09:28:23.7569666Z #48 0.526 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T09:28:23.7570475Z #48 0.526 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T09:28:23.7571271Z #48 0.526 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T09:28:23.7572092Z #48 0.526 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T09:28:23.7572968Z #48 0.526 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T09:28:23.7573736Z #48 0.526 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T09:28:23.7574511Z #48 0.527 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T09:28:23.7575341Z #48 0.527 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T09:28:23.7576119Z #48 0.527 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T09:28:23.7576865Z #48 0.527 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T09:28:23.7577654Z #48 0.527 DEBUG Found stale response for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T09:28:23.7578464Z #48 0.527 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T09:28:23.7579246Z #48 0.527 DEBUG Found stale response for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T09:28:23.7580035Z #48 0.527 DEBUG Sending revalidation request for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T09:28:23.7580783Z #48 0.527 DEBUG Found stale response for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T09:28:23.7581527Z #48 0.527 DEBUG Sending revalidation request for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T09:28:23.7582246Z #48 0.527 DEBUG Found stale response for: https://pypi.org/simple/setuptools/ 2025-09-07T09:28:23.7582961Z #48 0.527 DEBUG Sending revalidation request for: https://pypi.org/simple/setuptools/ 2025-09-07T09:28:23.7583699Z #48 0.527 DEBUG Found not-modified response for: https://pypi.org/simple/filelock/ 2025-09-07T09:28:23.7584442Z #48 0.527 DEBUG Found not-modified response for: https://pypi.org/simple/typing-extensions/ 2025-09-07T09:28:23.7585485Z #48 0.527 DEBUG Found installed version of filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) that satisfies * 2025-09-07T09:28:23.7586685Z #48 0.527 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.10.0 2025-09-07T09:28:23.7587667Z #48 0.527 DEBUG Found not-modified response for: https://pypi.org/simple/sympy/ 2025-09-07T09:28:23.7588518Z #48 0.528 DEBUG Found installed version of sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) that satisfies >=1.13.3 2025-09-07T09:28:23.7589447Z #48 0.528 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cuda-runtime-cu12/ 2025-09-07T09:28:23.7590280Z #48 0.528 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cuda-cupti-cu12/ 2025-09-07T09:28:23.7591013Z #48 0.528 DEBUG Found not-modified response for: https://pypi.org/simple/jinja2/ 2025-09-07T09:28:23.7591697Z #48 0.528 DEBUG Found not-modified response for: https://pypi.org/simple/networkx/ 2025-09-07T09:28:23.7593006Z #48 0.529 DEBUG Found installed version of jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) that satisfies * 2025-09-07T09:28:23.7594133Z #48 0.529 DEBUG Found installed version of networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) that satisfies >=2.5.1 2025-09-07T09:28:23.7595081Z #48 0.529 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cudnn-cu12/ 2025-09-07T09:28:23.7595875Z #48 0.529 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cublas-cu12/ 2025-09-07T09:28:23.7596680Z #48 0.529 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-curand-cu12/ 2025-09-07T09:28:23.7597482Z #48 0.529 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cufft-cu12/ 2025-09-07T09:28:23.7598288Z #48 0.529 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cuda-nvrtc-cu12/ 2025-09-07T09:28:23.7599065Z #48 0.529 DEBUG Found not-modified response for: https://pypi.org/simple/fsspec/ 2025-09-07T09:28:23.7599802Z #48 0.530 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cufile-cu12/ 2025-09-07T09:28:23.7600603Z #48 0.530 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nccl-cu12/ 2025-09-07T09:28:23.7601415Z #48 0.530 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nvjitlink-cu12/ 2025-09-07T09:28:23.7602186Z #48 0.530 DEBUG Found not-modified response for: https://pypi.org/simple/setuptools/ 2025-09-07T09:28:23.7603016Z #48 0.530 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cusolver-cu12/ 2025-09-07T09:28:23.7603842Z #48 0.530 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cusparselt-cu12/ 2025-09-07T09:28:23.7604765Z #48 0.530 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nvtx-cu12/ 2025-09-07T09:28:23.7605532Z #48 0.530 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-cusparse-cu12/ 2025-09-07T09:28:23.7606576Z #48 0.531 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.93, <12.8.93+) 2025-09-07T09:28:23.7608108Z #48 0.531 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies >=12.8.93, <12.8.93+ 2025-09-07T09:28:23.7609315Z #48 0.531 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.8.93 [installed] (installed) 2025-09-07T09:28:23.7610145Z #48 0.531 DEBUG Adding transitive dependency for nvidia-cuda-nvrtc-cu12==12.8.93: nvidia-cuda-nvrtc-cu12==12.8.93 2025-09-07T09:28:23.7611357Z #48 0.531 DEBUG Adding transitive dependency for nvidia-cuda-nvrtc-cu12==12.8.93: nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.93 2025-09-07T09:28:23.7612507Z #48 0.531 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12 (==12.8.93) 2025-09-07T09:28:23.7614022Z #48 0.531 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T09:28:23.7615236Z #48 0.531 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.8.93 [installed] (installed) 2025-09-07T09:28:23.7615973Z #48 0.531 DEBUG Found not-modified response for: https://pypi.org/simple/pytorch-triton/ 2025-09-07T09:28:23.7616782Z #48 0.531 DEBUG Found not-modified response for: https://pypi.org/simple/nvidia-nvshmem-cu12/ 2025-09-07T09:28:23.7618163Z #48 0.532 DEBUG Found installed version of fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) that satisfies >=0.8.5 2025-09-07T09:28:23.7619608Z #48 0.532 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T09:28:23.7621110Z #48 0.532 DEBUG Searching for a compatible version of nvidia-cuda-nvrtc-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.93) 2025-09-07T09:28:23.7622685Z #48 0.532 DEBUG Found installed version of nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T09:28:23.7623896Z #48 0.532 DEBUG Selecting: nvidia-cuda-nvrtc-cu12==12.8.93 [installed] (installed) 2025-09-07T09:28:23.7624905Z #48 0.532 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.90, <12.8.90+) 2025-09-07T09:28:23.7626605Z #48 0.532 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.8.90, <12.8.90+ 2025-09-07T09:28:23.7627839Z #48 0.532 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.8.90 [installed] (installed) 2025-09-07T09:28:23.7628701Z #48 0.532 DEBUG Adding transitive dependency for nvidia-cuda-runtime-cu12==12.8.90: nvidia-cuda-runtime-cu12==12.8.90 2025-09-07T09:28:23.7629947Z #48 0.532 DEBUG Adding transitive dependency for nvidia-cuda-runtime-cu12==12.8.90: nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.90 2025-09-07T09:28:23.7631057Z #48 0.532 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12 (==12.8.90) 2025-09-07T09:28:23.7632332Z #48 0.532 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:28:23.7633530Z #48 0.532 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.8.90 [installed] (installed) 2025-09-07T09:28:23.7634753Z #48 0.533 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:28:23.7636237Z #48 0.533 DEBUG Searching for a compatible version of nvidia-cuda-runtime-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.90) 2025-09-07T09:28:23.7637703Z #48 0.533 DEBUG Found installed version of nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:28:23.7638919Z #48 0.533 DEBUG Selecting: nvidia-cuda-runtime-cu12==12.8.90 [installed] (installed) 2025-09-07T09:28:23.7639904Z #48 0.533 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.90, <12.8.90+) 2025-09-07T09:28:23.7641581Z #48 0.533 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.8.90, <12.8.90+ 2025-09-07T09:28:23.7642801Z #48 0.533 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.8.90 [installed] (installed) 2025-09-07T09:28:23.7643652Z #48 0.533 DEBUG Adding transitive dependency for nvidia-cuda-cupti-cu12==12.8.90: nvidia-cuda-cupti-cu12==12.8.90 2025-09-07T09:28:23.7644862Z #48 0.533 DEBUG Adding transitive dependency for nvidia-cuda-cupti-cu12==12.8.90: nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.90 2025-09-07T09:28:23.7645930Z #48 0.533 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12 (==12.8.90) 2025-09-07T09:28:23.7647147Z #48 0.533 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:28:23.7648331Z #48 0.533 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.8.90 [installed] (installed) 2025-09-07T09:28:23.7649500Z #48 0.533 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:28:23.7650987Z #48 0.533 DEBUG Searching for a compatible version of nvidia-cuda-cupti-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.90) 2025-09-07T09:28:23.7652559Z #48 0.533 DEBUG Found installed version of nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:28:23.7653939Z #48 0.533 DEBUG Selecting: nvidia-cuda-cupti-cu12==12.8.90 [installed] (installed) 2025-09-07T09:28:23.7654934Z #48 0.533 DEBUG Searching for a compatible version of nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=9.10.2.21, <9.10.2.21+) 2025-09-07T09:28:23.7656353Z #48 0.533 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=9.10.2.21, <9.10.2.21+ 2025-09-07T09:28:23.7657475Z #48 0.533 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T09:28:23.7658278Z #48 0.533 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cudnn-cu12==9.10.2.21 2025-09-07T09:28:23.7659424Z #48 0.533 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==9.10.2.21 2025-09-07T09:28:23.7660473Z #48 0.533 DEBUG Searching for a compatible version of nvidia-cudnn-cu12 (==9.10.2.21) 2025-09-07T09:28:23.7661595Z #48 0.533 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T09:28:23.7662672Z #48 0.533 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T09:28:23.7663748Z #48 0.533 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T09:28:23.7665015Z #48 0.533 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cublas-cu12* 2025-09-07T09:28:23.7666013Z #48 0.533 DEBUG Searching for a compatible version of nvidia-cudnn-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==9.10.2.21) 2025-09-07T09:28:23.7667276Z #48 0.533 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies * 2025-09-07T09:28:23.7668667Z #48 0.533 DEBUG Found installed version of nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==9.10.2.21 2025-09-07T09:28:23.7669705Z #48 0.533 DEBUG Selecting: nvidia-cudnn-cu12==9.10.2.21 [installed] (installed) 2025-09-07T09:28:23.7670423Z #48 0.533 DEBUG Adding transitive dependency for nvidia-cudnn-cu12==9.10.2.21: nvidia-cublas-cu12* 2025-09-07T09:28:23.7671488Z #48 0.533 DEBUG Searching for a compatible version of nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.4.1, <12.8.4.1+) 2025-09-07T09:28:23.7672875Z #48 0.533 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=12.8.4.1, <12.8.4.1+ 2025-09-07T09:28:23.7673946Z #48 0.533 DEBUG Selecting: nvidia-cublas-cu12==12.8.4.1 [installed] (installed) 2025-09-07T09:28:23.7674719Z #48 0.533 DEBUG Adding transitive dependency for nvidia-cublas-cu12==12.8.4.1: nvidia-cublas-cu12==12.8.4.1 2025-09-07T09:28:23.7675844Z #48 0.533 DEBUG Adding transitive dependency for nvidia-cublas-cu12==12.8.4.1: nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.4.1 2025-09-07T09:28:23.7676847Z #48 0.533 DEBUG Searching for a compatible version of nvidia-cublas-cu12 (==12.8.4.1) 2025-09-07T09:28:23.7677917Z #48 0.533 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.8.4.1 2025-09-07T09:28:23.7679421Z #48 0.533 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.8.4.1 2025-09-07T09:28:23.7680460Z #48 0.533 DEBUG Selecting: nvidia-cublas-cu12==12.8.4.1 [installed] (installed) 2025-09-07T09:28:23.7681373Z #48 0.533 DEBUG Searching for a compatible version of nvidia-cublas-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.4.1) 2025-09-07T09:28:23.7682679Z #48 0.533 DEBUG Found installed version of nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==12.8.4.1 2025-09-07T09:28:23.7684107Z #48 0.533 DEBUG Selecting: nvidia-cublas-cu12==12.8.4.1 [installed] (installed) 2025-09-07T09:28:23.7685054Z #48 0.533 DEBUG Searching for a compatible version of nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=11.3.3.83, <11.3.3.83+) 2025-09-07T09:28:23.7686521Z #48 0.533 DEBUG Found installed version of nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=11.3.3.83, <11.3.3.83+ 2025-09-07T09:28:23.7687693Z #48 0.533 DEBUG Selecting: nvidia-cufft-cu12==11.3.3.83 [installed] (installed) 2025-09-07T09:28:23.7688482Z #48 0.533 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.3.3.83: nvidia-cufft-cu12==11.3.3.83 2025-09-07T09:28:23.7689600Z #48 0.533 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.3.3.83: nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==11.3.3.83 2025-09-07T09:28:23.7690611Z #48 0.533 DEBUG Searching for a compatible version of nvidia-cufft-cu12 (==11.3.3.83) 2025-09-07T09:28:23.7691757Z #48 0.533 DEBUG Found installed version of nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.3.3.83 2025-09-07T09:28:23.7693873Z #48 0.533 DEBUG Found installed version of nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.3.3.83 2025-09-07T09:28:23.7695039Z #48 0.533 DEBUG Selecting: nvidia-cufft-cu12==11.3.3.83 [installed] (installed) 2025-09-07T09:28:23.7695807Z #48 0.533 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.3.3.83: nvidia-nvjitlink-cu12* 2025-09-07T09:28:23.7696860Z #48 0.533 DEBUG Searching for a compatible version of nvidia-cufft-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==11.3.3.83) 2025-09-07T09:28:23.7698297Z #48 0.533 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies * 2025-09-07T09:28:23.7700053Z #48 0.533 DEBUG Found installed version of nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==11.3.3.83 2025-09-07T09:28:23.7701610Z #48 0.533 DEBUG Selecting: nvidia-cufft-cu12==11.3.3.83 [installed] (installed) 2025-09-07T09:28:23.7702381Z #48 0.533 DEBUG Adding transitive dependency for nvidia-cufft-cu12==11.3.3.83: nvidia-nvjitlink-cu12* 2025-09-07T09:28:23.7706414Z #48 0.533 DEBUG Searching for a compatible version of nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=10.3.9.90, <10.3.9.90+) 2025-09-07T09:28:23.7709021Z #48 0.533 DEBUG Found installed version of nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=10.3.9.90, <10.3.9.90+ 2025-09-07T09:28:23.7710509Z #48 0.533 DEBUG Selecting: nvidia-curand-cu12==10.3.9.90 [installed] (installed) 2025-09-07T09:28:23.7712635Z #48 0.533 DEBUG Adding transitive dependency for nvidia-curand-cu12==10.3.9.90: nvidia-curand-cu12==10.3.9.90 2025-09-07T09:28:23.7713874Z #48 0.533 DEBUG Adding transitive dependency for nvidia-curand-cu12==10.3.9.90: nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==10.3.9.90 2025-09-07T09:28:23.7714962Z #48 0.533 DEBUG Searching for a compatible version of nvidia-curand-cu12 (==10.3.9.90) 2025-09-07T09:28:23.7716049Z #48 0.533 DEBUG Found installed version of nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.9.90 2025-09-07T09:28:23.7717501Z #48 0.533 DEBUG Found installed version of nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.9.90 2025-09-07T09:28:23.7718555Z #48 0.533 DEBUG Selecting: nvidia-curand-cu12==10.3.9.90 [installed] (installed) 2025-09-07T09:28:23.7719470Z #48 0.534 DEBUG Searching for a compatible version of nvidia-curand-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==10.3.9.90) 2025-09-07T09:28:23.7720778Z #48 0.534 DEBUG Found installed version of nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==10.3.9.90 2025-09-07T09:28:23.7721831Z #48 0.534 DEBUG Selecting: nvidia-curand-cu12==10.3.9.90 [installed] (installed) 2025-09-07T09:28:23.7722784Z #48 0.534 DEBUG Searching for a compatible version of nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=11.7.3.90, <11.7.3.90+) 2025-09-07T09:28:23.7724263Z #48 0.534 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies >=11.7.3.90, <11.7.3.90+ 2025-09-07T09:28:23.7725533Z #48 0.534 DEBUG Selecting: nvidia-cusolver-cu12==11.7.3.90 [installed] (installed) 2025-09-07T09:28:23.7726341Z #48 0.534 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cusolver-cu12==11.7.3.90 2025-09-07T09:28:23.7727939Z #48 0.534 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==11.7.3.90 2025-09-07T09:28:23.7729007Z #48 0.534 DEBUG Searching for a compatible version of nvidia-cusolver-cu12 (==11.7.3.90) 2025-09-07T09:28:23.7730692Z #48 0.534 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.3.90 2025-09-07T09:28:23.7732644Z #48 0.534 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.3.90 2025-09-07T09:28:23.7733931Z #48 0.534 DEBUG Selecting: nvidia-cusolver-cu12==11.7.3.90 [installed] (installed) 2025-09-07T09:28:23.7735099Z #48 0.534 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cublas-cu12* 2025-09-07T09:28:23.7736464Z #48 0.534 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-nvjitlink-cu12* 2025-09-07T09:28:23.7737437Z #48 0.534 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cusparse-cu12* 2025-09-07T09:28:23.7738505Z #48 0.534 DEBUG Searching for a compatible version of nvidia-cusolver-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==11.7.3.90) 2025-09-07T09:28:23.7740620Z #48 0.534 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies * 2025-09-07T09:28:23.7742245Z #48 0.534 DEBUG Found installed version of nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) that satisfies ==11.7.3.90 2025-09-07T09:28:23.7743623Z #48 0.534 DEBUG Selecting: nvidia-cusolver-cu12==11.7.3.90 [installed] (installed) 2025-09-07T09:28:23.7744975Z #48 0.534 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cublas-cu12* 2025-09-07T09:28:23.7745899Z #48 0.534 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-nvjitlink-cu12* 2025-09-07T09:28:23.7746789Z #48 0.534 DEBUG Adding transitive dependency for nvidia-cusolver-cu12==11.7.3.90: nvidia-cusparse-cu12* 2025-09-07T09:28:23.7748124Z #48 0.534 DEBUG Searching for a compatible version of nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.5.8.93, <12.5.8.93+) 2025-09-07T09:28:23.7750813Z #48 0.534 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.5.8.93, <12.5.8.93+ 2025-09-07T09:28:23.7752273Z #48 0.534 DEBUG Selecting: nvidia-cusparse-cu12==12.5.8.93 [installed] (installed) 2025-09-07T09:28:23.7753927Z #48 0.534 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.8.93: nvidia-cusparse-cu12==12.5.8.93 2025-09-07T09:28:23.7756152Z #48 0.534 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.8.93: nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.5.8.93 2025-09-07T09:28:23.7757611Z #48 0.534 DEBUG Searching for a compatible version of nvidia-cusparse-cu12 (==12.5.8.93) 2025-09-07T09:28:23.7759682Z #48 0.534 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.8.93 2025-09-07T09:28:23.7761429Z #48 0.534 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.8.93 2025-09-07T09:28:23.7762594Z #48 0.534 DEBUG Selecting: nvidia-cusparse-cu12==12.5.8.93 [installed] (installed) 2025-09-07T09:28:23.7763373Z #48 0.534 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.8.93: nvidia-nvjitlink-cu12* 2025-09-07T09:28:23.7764413Z #48 0.534 DEBUG Searching for a compatible version of nvidia-cusparse-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.5.8.93) 2025-09-07T09:28:23.7765869Z #48 0.534 DEBUG Found installed version of nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.5.8.93 2025-09-07T09:28:23.7767050Z #48 0.534 DEBUG Selecting: nvidia-cusparse-cu12==12.5.8.93 [installed] (installed) 2025-09-07T09:28:23.7767822Z #48 0.534 DEBUG Adding transitive dependency for nvidia-cusparse-cu12==12.5.8.93: nvidia-nvjitlink-cu12* 2025-09-07T09:28:23.7768896Z #48 0.534 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=0.7.1, <0.7.1+) 2025-09-07T09:28:23.7770310Z #48 0.534 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies >=0.7.1, <0.7.1+ 2025-09-07T09:28:23.7771458Z #48 0.534 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T09:28:23.7772264Z #48 0.534 DEBUG Adding transitive dependency for nvidia-cusparselt-cu12==0.7.1: nvidia-cusparselt-cu12==0.7.1 2025-09-07T09:28:23.7773816Z #48 0.534 DEBUG Adding transitive dependency for nvidia-cusparselt-cu12==0.7.1: nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==0.7.1 2025-09-07T09:28:23.7774918Z #48 0.534 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12 (==0.7.1) 2025-09-07T09:28:23.7776057Z #48 0.534 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T09:28:23.7777743Z #48 0.534 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T09:28:23.7779183Z #48 0.534 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T09:28:23.7780639Z #48 0.534 DEBUG Searching for a compatible version of nvidia-cusparselt-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==0.7.1) 2025-09-07T09:28:23.7782024Z #48 0.534 DEBUG Found installed version of nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) that satisfies ==0.7.1 2025-09-07T09:28:23.7783131Z #48 0.534 DEBUG Selecting: nvidia-cusparselt-cu12==0.7.1 [installed] (installed) 2025-09-07T09:28:23.7784085Z #48 0.534 DEBUG Searching for a compatible version of nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=2.27.5, <2.27.5+) 2025-09-07T09:28:23.7786160Z #48 0.534 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=2.27.5, <2.27.5+ 2025-09-07T09:28:23.7787446Z #48 0.534 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T09:28:23.7788180Z #48 0.534 DEBUG Adding transitive dependency for nvidia-nccl-cu12==2.27.5: nvidia-nccl-cu12==2.27.5 2025-09-07T09:28:23.7789278Z #48 0.534 DEBUG Adding transitive dependency for nvidia-nccl-cu12==2.27.5: nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==2.27.5 2025-09-07T09:28:23.7790515Z #48 0.534 DEBUG Searching for a compatible version of nvidia-nccl-cu12 (==2.27.5) 2025-09-07T09:28:23.7792330Z #48 0.534 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T09:28:23.7793476Z #48 0.534 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T09:28:23.7794975Z #48 0.534 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T09:28:23.7796364Z #48 0.534 DEBUG Searching for a compatible version of nvidia-nccl-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==2.27.5) 2025-09-07T09:28:23.7797754Z #48 0.534 DEBUG Found installed version of nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==2.27.5 2025-09-07T09:28:23.7798853Z #48 0.534 DEBUG Selecting: nvidia-nccl-cu12==2.27.5 [installed] (installed) 2025-09-07T09:28:23.7799802Z #48 0.534 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=3.3.20, <3.3.20+) 2025-09-07T09:28:23.7801304Z #48 0.534 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=3.3.20, <3.3.20+ 2025-09-07T09:28:23.7802504Z #48 0.534 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T09:28:23.7803397Z #48 0.534 DEBUG Adding transitive dependency for nvidia-nvshmem-cu12==3.3.20: nvidia-nvshmem-cu12==3.3.20 2025-09-07T09:28:23.7804652Z #48 0.534 DEBUG Adding transitive dependency for nvidia-nvshmem-cu12==3.3.20: nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==3.3.20 2025-09-07T09:28:23.7805674Z #48 0.534 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12 (==3.3.20) 2025-09-07T09:28:23.7806846Z #48 0.534 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T09:28:23.7808460Z #48 0.534 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T09:28:23.7809636Z #48 0.534 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T09:28:23.7810966Z #48 0.534 DEBUG Searching for a compatible version of nvidia-nvshmem-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==3.3.20) 2025-09-07T09:28:23.7813070Z #48 0.534 DEBUG Found installed version of nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==3.3.20 2025-09-07T09:28:23.7814242Z #48 0.534 DEBUG Selecting: nvidia-nvshmem-cu12==3.3.20 [installed] (installed) 2025-09-07T09:28:23.7815183Z #48 0.534 DEBUG Searching for a compatible version of nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.90, <12.8.90+) 2025-09-07T09:28:23.7816651Z #48 0.534 DEBUG Found installed version of nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=12.8.90, <12.8.90+ 2025-09-07T09:28:23.7817815Z #48 0.534 DEBUG Selecting: nvidia-nvtx-cu12==12.8.90 [installed] (installed) 2025-09-07T09:28:23.7818558Z #48 0.534 DEBUG Adding transitive dependency for nvidia-nvtx-cu12==12.8.90: nvidia-nvtx-cu12==12.8.90 2025-09-07T09:28:23.7819667Z #48 0.534 DEBUG Adding transitive dependency for nvidia-nvtx-cu12==12.8.90: nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.90 2025-09-07T09:28:23.7820673Z #48 0.534 DEBUG Searching for a compatible version of nvidia-nvtx-cu12 (==12.8.90) 2025-09-07T09:28:23.7821889Z #48 0.534 DEBUG Found installed version of nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:28:23.7823015Z #48 0.534 DEBUG Selecting: nvidia-nvtx-cu12==12.8.90 [installed] (installed) 2025-09-07T09:28:23.7824130Z #48 0.534 DEBUG Found installed version of nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:28:23.7825606Z #48 0.534 DEBUG Searching for a compatible version of nvidia-nvtx-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.90) 2025-09-07T09:28:23.7826960Z #48 0.534 DEBUG Found installed version of nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==12.8.90 2025-09-07T09:28:23.7828044Z #48 0.534 DEBUG Selecting: nvidia-nvtx-cu12==12.8.90 [installed] (installed) 2025-09-07T09:28:23.7829444Z #48 0.534 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=12.8.93, <12.8.93+) 2025-09-07T09:28:23.7830953Z #48 0.534 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies >=12.8.93, <12.8.93+ 2025-09-07T09:28:23.7832173Z #48 0.534 DEBUG Selecting: nvidia-nvjitlink-cu12==12.8.93 [installed] (installed) 2025-09-07T09:28:23.7833027Z #48 0.534 DEBUG Adding transitive dependency for nvidia-nvjitlink-cu12==12.8.93: nvidia-nvjitlink-cu12==12.8.93 2025-09-07T09:28:23.7834212Z #48 0.534 DEBUG Adding transitive dependency for nvidia-nvjitlink-cu12==12.8.93: nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==12.8.93 2025-09-07T09:28:23.7835278Z #48 0.534 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12 (==12.8.93) 2025-09-07T09:28:23.7836489Z #48 0.534 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T09:28:23.7838165Z #48 0.534 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T09:28:23.7839375Z #48 0.534 DEBUG Selecting: nvidia-nvjitlink-cu12==12.8.93 [installed] (installed) 2025-09-07T09:28:23.7840301Z #48 0.534 DEBUG Searching for a compatible version of nvidia-nvjitlink-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==12.8.93) 2025-09-07T09:28:23.7841780Z #48 0.534 DEBUG Found installed version of nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) that satisfies ==12.8.93 2025-09-07T09:28:23.7842953Z #48 0.534 DEBUG Selecting: nvidia-nvjitlink-cu12==12.8.93 [installed] (installed) 2025-09-07T09:28:23.7843896Z #48 0.534 DEBUG Searching for a compatible version of nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (>=1.13.1.3, <1.13.1.3+) 2025-09-07T09:28:23.7845371Z #48 0.534 DEBUG Found installed version of nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies >=1.13.1.3, <1.13.1.3+ 2025-09-07T09:28:23.7846545Z #48 0.534 DEBUG Selecting: nvidia-cufile-cu12==1.13.1.3 [installed] (installed) 2025-09-07T09:28:23.7847310Z #48 0.534 DEBUG Adding transitive dependency for nvidia-cufile-cu12==1.13.1.3: nvidia-cufile-cu12==1.13.1.3 2025-09-07T09:28:23.7848441Z #48 0.534 DEBUG Adding transitive dependency for nvidia-cufile-cu12==1.13.1.3: nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'}==1.13.1.3 2025-09-07T09:28:23.7849450Z #48 0.534 DEBUG Searching for a compatible version of nvidia-cufile-cu12 (==1.13.1.3) 2025-09-07T09:28:23.7850644Z #48 0.534 DEBUG Found installed version of nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.13.1.3 2025-09-07T09:28:23.7852268Z #48 0.534 DEBUG Found installed version of nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.13.1.3 2025-09-07T09:28:23.7853732Z #48 0.534 DEBUG Selecting: nvidia-cufile-cu12==1.13.1.3 [installed] (installed) 2025-09-07T09:28:23.7854679Z #48 0.534 DEBUG Searching for a compatible version of nvidia-cufile-cu12{platform_machine == 'x86_64' and sys_platform == 'linux'} (==1.13.1.3) 2025-09-07T09:28:23.7856117Z #48 0.534 DEBUG Found installed version of nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) that satisfies ==1.13.1.3 2025-09-07T09:28:23.7857272Z #48 0.534 DEBUG Selecting: nvidia-cufile-cu12==1.13.1.3 [installed] (installed) 2025-09-07T09:28:23.7858227Z #48 0.534 DEBUG Searching for a compatible version of pytorch-triton{platform_machine == 'x86_64' and sys_platform == 'linux'} (==3.4.0+gitf7888497) 2025-09-07T09:28:23.7862552Z #48 0.534 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T09:28:23.7866195Z #48 0.534 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T09:28:23.7868432Z #48 0.534 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: pytorch-triton==3.4.0+gitf7888497 2025-09-07T09:28:23.7869653Z #48 0.534 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: pytorch-triton{platform_machine == 'x86_64' and sys_platform == 'linux'}==3.4.0+gitf7888497 2025-09-07T09:28:23.7870752Z #48 0.534 DEBUG Searching for a compatible version of pytorch-triton (==3.4.0+gitf7888497) 2025-09-07T09:28:23.7872078Z #48 0.534 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T09:28:23.7873351Z #48 0.534 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T09:28:23.7874680Z #48 0.534 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T09:28:23.7876089Z #48 0.535 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: setuptools>=40.8.0 2025-09-07T09:28:23.7877103Z #48 0.535 DEBUG Searching for a compatible version of pytorch-triton{platform_machine == 'x86_64' and sys_platform == 'linux'} (==3.4.0+gitf7888497) 2025-09-07T09:28:23.7878646Z #48 0.535 DEBUG Found installed version of pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) that satisfies ==3.4.0+gitf7888497 2025-09-07T09:28:23.7879921Z #48 0.535 DEBUG Selecting: pytorch-triton==3.4.0+gitf7888497 [installed] (installed) 2025-09-07T09:28:23.7881793Z #48 0.535 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies >=40.8.0 2025-09-07T09:28:23.7882901Z #48 0.535 DEBUG Adding transitive dependency for pytorch-triton==3.4.0+gitf7888497: setuptools>=40.8.0 2025-09-07T09:28:23.7883578Z #48 0.535 DEBUG Searching for a compatible version of ninja (*) 2025-09-07T09:28:23.7884480Z #48 0.535 DEBUG Found installed version of ninja==1.13.0 that satisfies * 2025-09-07T09:28:23.7885017Z #48 0.535 DEBUG Selecting: ninja==1.13.0 [installed] (installed) 2025-09-07T09:28:23.7885550Z #48 0.535 DEBUG Searching for a compatible version of requests (*) 2025-09-07T09:28:23.7886173Z #48 0.535 DEBUG Found installed version of requests==2.32.5 that satisfies * 2025-09-07T09:28:23.7886734Z #48 0.535 DEBUG Selecting: requests==2.32.5 [installed] (installed) 2025-09-07T09:28:23.7887495Z #48 0.535 DEBUG Adding transitive dependency for requests==2.32.5: charset-normalizer>=2, <4 2025-09-07T09:28:23.7888192Z #48 0.535 DEBUG Adding transitive dependency for requests==2.32.5: idna>=2.5, <4 2025-09-07T09:28:23.7889309Z #48 0.535 DEBUG Adding transitive dependency for requests==2.32.5: urllib3>=1.21.1, <3 2025-09-07T09:28:23.7890015Z #48 0.535 DEBUG Adding transitive dependency for requests==2.32.5: certifi>=2017.4.17 2025-09-07T09:28:23.7890674Z #48 0.535 DEBUG Searching for a compatible version of cuda-python (<=12.9+) 2025-09-07T09:28:23.7891379Z #48 0.535 DEBUG Selecting: cuda-python==12.9.0 [compatible] (cuda_python-12.9.0-py3-none-any.whl) 2025-09-07T09:28:23.7892595Z #48 0.535 DEBUG Found stale response for: https://pypi.org/simple/idna/ 2025-09-07T09:28:23.7893669Z #48 0.535 DEBUG Sending revalidation request for: https://pypi.org/simple/idna/ 2025-09-07T09:28:23.7894325Z #48 0.535 DEBUG Found stale response for: https://pypi.org/simple/urllib3/ 2025-09-07T09:28:23.7895006Z #48 0.535 DEBUG Sending revalidation request for: https://pypi.org/simple/urllib3/ 2025-09-07T09:28:23.7895681Z #48 0.535 DEBUG Found stale response for: https://pypi.org/simple/certifi/ 2025-09-07T09:28:23.7896606Z #48 0.535 DEBUG Sending revalidation request for: https://pypi.org/simple/certifi/ 2025-09-07T09:28:23.7897823Z #48 0.536 DEBUG Found stale response for: https://pypi.org/simple/charset-normalizer/ 2025-09-07T09:28:23.7898608Z #48 0.536 DEBUG Sending revalidation request for: https://pypi.org/simple/charset-normalizer/ 2025-09-07T09:28:23.7899665Z #48 0.536 DEBUG Adding transitive dependency for cuda-python==12.9.0: cuda-bindings>=12.9.0, <12.10.dev0 2025-09-07T09:28:23.7900371Z #48 0.536 DEBUG Searching for a compatible version of pynvml (*) 2025-09-07T09:28:23.7901009Z #48 0.536 DEBUG Selecting: pynvml==13.0.1 [compatible] (pynvml-13.0.1-py3-none-any.whl) 2025-09-07T09:28:23.7901738Z #48 0.536 DEBUG Adding transitive dependency for pynvml==13.0.1: nvidia-ml-py>=12.0.0 2025-09-07T09:28:23.7902363Z #48 0.536 DEBUG Searching for a compatible version of einops (*) 2025-09-07T09:28:23.7902937Z #48 0.536 DEBUG Found installed version of einops==0.8.1 that satisfies * 2025-09-07T09:28:23.7903493Z #48 0.536 DEBUG Selecting: einops==0.8.1 [installed] (installed) 2025-09-07T09:28:23.7904128Z #48 0.536 DEBUG Searching for a compatible version of packaging (>=24.2) 2025-09-07T09:28:23.7904942Z #48 0.536 DEBUG Found installed version of packaging==25.0 that satisfies >=24.2 2025-09-07T09:28:23.7905819Z #48 0.536 DEBUG Selecting: packaging==25.0 [installed] (installed) 2025-09-07T09:28:23.7907043Z #48 0.536 DEBUG Searching for a compatible version of nvidia-cudnn-frontend (>=1.13.0) 2025-09-07T09:28:23.7908176Z #48 0.536 DEBUG Selecting: nvidia-cudnn-frontend==1.14.1 [compatible] (nvidia_cudnn_frontend-1.14.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:28:23.7909545Z #48 0.536 DEBUG Searching for a compatible version of filelock (*) 2025-09-07T09:28:23.7910759Z #48 0.536 DEBUG Found installed version of filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) that satisfies * 2025-09-07T09:28:23.7911771Z #48 0.536 DEBUG Selecting: filelock==3.19.1 [installed] (installed) 2025-09-07T09:28:23.7912409Z #48 0.536 DEBUG Searching for a compatible version of typing-extensions (>=4.10.0) 2025-09-07T09:28:23.7913592Z #48 0.536 DEBUG Found installed version of typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) that satisfies >=4.10.0 2025-09-07T09:28:23.7914933Z #48 0.536 DEBUG Selecting: typing-extensions==4.14.1 [installed] (installed) 2025-09-07T09:28:23.7915538Z #48 0.536 DEBUG No cache entry for: https://pypi.org/simple/cuda-bindings/ 2025-09-07T09:28:23.7916340Z #48 0.536 DEBUG Searching for a compatible version of setuptools{python_full_version >= '3.12'} (*) 2025-09-07T09:28:23.7917299Z #48 0.536 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies * 2025-09-07T09:28:23.7918113Z #48 0.536 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T09:28:23.7918754Z #48 0.536 DEBUG Adding transitive dependency for setuptools==78.1.0: setuptools==78.1.0 2025-09-07T09:28:23.7919584Z #48 0.536 DEBUG Adding transitive dependency for setuptools==78.1.0: setuptools{python_full_version >= '3.12'}==78.1.0 2025-09-07T09:28:23.7920378Z #48 0.536 DEBUG Searching for a compatible version of setuptools (==78.1.0) 2025-09-07T09:28:23.7921261Z #48 0.536 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T09:28:23.7922099Z #48 0.536 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T09:28:23.7922809Z #48 0.536 DEBUG Searching for a compatible version of setuptools{python_full_version >= '3.12'} (==78.1.0) 2025-09-07T09:28:23.7923530Z #48 0.536 DEBUG No cache entry for: https://pypi.org/simple/nvidia-ml-py/ 2025-09-07T09:28:23.7924412Z #48 0.536 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T09:28:23.7925263Z #48 0.536 DEBUG Selecting: setuptools==78.1.0 [installed] (installed) 2025-09-07T09:28:23.7925807Z #48 0.536 DEBUG Searching for a compatible version of sympy (>=1.13.3) 2025-09-07T09:28:23.7926645Z #48 0.536 DEBUG Found installed version of sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) that satisfies >=1.13.3 2025-09-07T09:28:23.7927405Z #48 0.536 DEBUG Selecting: sympy==1.14.0 [installed] (installed) 2025-09-07T09:28:23.7928233Z #48 0.536 DEBUG Found installed version of setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) that satisfies ==78.1.0 2025-09-07T09:28:23.7929149Z #48 0.536 DEBUG Adding transitive dependency for sympy==1.14.0: mpmath>=1.1.0, <1.4 2025-09-07T09:28:23.7930212Z #48 0.536 DEBUG Searching for a compatible version of networkx (>=2.5.1) 2025-09-07T09:28:23.7931033Z #48 0.536 DEBUG Found installed version of networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) that satisfies >=2.5.1 2025-09-07T09:28:23.7931803Z #48 0.536 DEBUG Selecting: networkx==3.5 [installed] (installed) 2025-09-07T09:28:23.7932322Z #48 0.536 DEBUG Searching for a compatible version of jinja2 (*) 2025-09-07T09:28:23.7933378Z #48 0.536 DEBUG Found installed version of jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) that satisfies * 2025-09-07T09:28:23.7934209Z #48 0.536 DEBUG Selecting: jinja2==3.1.6 [installed] (installed) 2025-09-07T09:28:23.7934807Z #48 0.536 DEBUG Adding transitive dependency for jinja2==3.1.6: markupsafe>=2.0 2025-09-07T09:28:23.7935427Z #48 0.536 DEBUG Searching for a compatible version of fsspec (>=0.8.5) 2025-09-07T09:28:23.7936291Z #48 0.536 DEBUG Found installed version of fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) that satisfies >=0.8.5 2025-09-07T09:28:23.7937117Z #48 0.536 DEBUG Selecting: fsspec==2025.7.0 [installed] (installed) 2025-09-07T09:28:23.7937714Z #48 0.536 DEBUG Found stale response for: https://pypi.org/simple/mpmath/ 2025-09-07T09:28:23.7938386Z #48 0.536 DEBUG Sending revalidation request for: https://pypi.org/simple/mpmath/ 2025-09-07T09:28:23.7939061Z #48 0.536 DEBUG Found not-modified response for: https://pypi.org/simple/idna/ 2025-09-07T09:28:23.7939745Z #48 0.536 DEBUG Found not-modified response for: https://pypi.org/simple/urllib3/ 2025-09-07T09:28:23.7941430Z #48 0.536 DEBUG Found installed version of idna==3.10 that satisfies >=2.5, <4 2025-09-07T09:28:23.7942883Z #48 0.537 DEBUG Found stale response for: https://pypi.org/simple/markupsafe/ 2025-09-07T09:28:23.7943624Z #48 0.537 DEBUG Sending revalidation request for: https://pypi.org/simple/markupsafe/ 2025-09-07T09:28:23.7944415Z #48 0.537 DEBUG Found installed version of urllib3==2.5.0 that satisfies >=1.21.1, <3 2025-09-07T09:28:23.7945272Z #48 0.537 DEBUG Found not-modified response for: https://pypi.org/simple/charset-normalizer/ 2025-09-07T09:28:23.7945994Z #48 0.537 DEBUG Found not-modified response for: https://pypi.org/simple/certifi/ 2025-09-07T09:28:23.7946671Z #48 0.538 DEBUG Found not-modified response for: https://pypi.org/simple/mpmath/ 2025-09-07T09:28:23.7947330Z #48 0.539 DEBUG Searching for a compatible version of charset-normalizer (>=2, <4) 2025-09-07T09:28:23.7948046Z #48 0.539 DEBUG Found installed version of charset-normalizer==3.4.3 that satisfies >=2, <4 2025-09-07T09:28:23.7948740Z #48 0.539 DEBUG Selecting: charset-normalizer==3.4.3 [installed] (installed) 2025-09-07T09:28:23.7949383Z #48 0.539 DEBUG Found installed version of certifi==2025.8.3 that satisfies >=2017.4.17 2025-09-07T09:28:23.7950101Z #48 0.539 DEBUG Found installed version of charset-normalizer==3.4.3 that satisfies >=2, <4 2025-09-07T09:28:23.7950822Z #48 0.539 DEBUG Found not-modified response for: https://pypi.org/simple/markupsafe/ 2025-09-07T09:28:23.7951720Z #48 0.540 DEBUG Found installed version of mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) that satisfies >=1.1.0, <1.4 2025-09-07T09:28:23.7952540Z #48 0.540 DEBUG Searching for a compatible version of idna (>=2.5, <4) 2025-09-07T09:28:23.7953119Z #48 0.540 DEBUG Found installed version of idna==3.10 that satisfies >=2.5, <4 2025-09-07T09:28:23.7953676Z #48 0.540 DEBUG Selecting: idna==3.10 [installed] (installed) 2025-09-07T09:28:23.7954212Z #48 0.540 DEBUG Searching for a compatible version of urllib3 (>=1.21.1, <3) 2025-09-07T09:28:23.7954890Z #48 0.540 DEBUG Found installed version of urllib3==2.5.0 that satisfies >=1.21.1, <3 2025-09-07T09:28:23.7955478Z #48 0.540 DEBUG Selecting: urllib3==2.5.0 [installed] (installed) 2025-09-07T09:28:23.7956032Z #48 0.540 DEBUG Searching for a compatible version of certifi (>=2017.4.17) 2025-09-07T09:28:23.7956941Z #48 0.540 DEBUG Found installed version of certifi==2025.8.3 that satisfies >=2017.4.17 2025-09-07T09:28:23.7958099Z #48 0.540 DEBUG Selecting: certifi==2025.8.3 [installed] (installed) 2025-09-07T09:28:23.7958775Z #48 0.540 DEBUG Searching for a compatible version of cuda-bindings (>=12.9.0, <12.10.dev0) 2025-09-07T09:28:23.7959721Z #48 0.540 DEBUG Selecting: cuda-bindings==12.9.2 [compatible] (cuda_bindings-12.9.2-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:28:23.7961444Z #48 0.540 DEBUG No cache entry for: https://files.pythonhosted.org/packages/26/15/3dbe02186dc0daaa8410aa1c1c368d36967b88035ce1cea663e9ba11312a/cuda_bindings-12.9.2-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata 2025-09-07T09:28:23.7963605Z #48 0.540 DEBUG No cache entry for: https://files.pythonhosted.org/packages/f9/96/88a5cb161c61cab2ee65b5aa61e612901fbcb1660024f0ccb26fcb02a17c/nvidia_ml_py-13.580.65-py3-none-any.whl.metadata 2025-09-07T09:28:23.7965268Z #48 0.540 DEBUG Found installed version of markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) that satisfies >=2.0 2025-09-07T09:28:23.7966423Z #48 0.543 DEBUG Adding transitive dependency for cuda-bindings==12.9.2: cuda-pathfinder>=1.1, <2.dev0 2025-09-07T09:28:23.7967157Z #48 0.543 DEBUG Searching for a compatible version of nvidia-ml-py (>=12.0.0) 2025-09-07T09:28:23.7967870Z #48 0.543 DEBUG Selecting: nvidia-ml-py==13.580.65 [compatible] (nvidia_ml_py-13.580.65-py3-none-any.whl) 2025-09-07T09:28:23.7968588Z #48 0.543 DEBUG Searching for a compatible version of mpmath (>=1.1.0, <1.4) 2025-09-07T09:28:23.7969422Z #48 0.543 DEBUG Found installed version of mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) that satisfies >=1.1.0, <1.4 2025-09-07T09:28:23.7970221Z #48 0.543 DEBUG Selecting: mpmath==1.3.0 [installed] (installed) 2025-09-07T09:28:23.7970761Z #48 0.543 DEBUG Searching for a compatible version of markupsafe (>=2.0) 2025-09-07T09:28:23.7971365Z #48 0.543 DEBUG No cache entry for: https://pypi.org/simple/cuda-pathfinder/ 2025-09-07T09:28:23.7972719Z #48 0.543 DEBUG Found installed version of markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) that satisfies >=2.0 2025-09-07T09:28:23.7973933Z #48 0.543 DEBUG Selecting: markupsafe==3.0.2 [installed] (installed) 2025-09-07T09:28:23.7974578Z #48 0.545 DEBUG Searching for a compatible version of cuda-pathfinder (>=1.1, <2.dev0) 2025-09-07T09:28:23.7975372Z #48 0.545 DEBUG Selecting: cuda-pathfinder==1.2.1 [compatible] (cuda_pathfinder-1.2.1-py3-none-any.whl) 2025-09-07T09:28:23.7976779Z #48 0.545 DEBUG No cache entry for: https://files.pythonhosted.org/packages/22/54/6231878f6fc490f222c87190ce12196b67b7700b30818882a87f478e4944/cuda_pathfinder-1.2.1-py3-none-any.whl.metadata 2025-09-07T09:28:23.7981439Z #48 0.549 DEBUG Tried 42 versions: certifi 1, charset-normalizer 1, cuda-bindings 1, cuda-pathfinder 1, cuda-python 1, einops 1, filelock 1, flashinfer-python 1, fsspec 1, idna 1, jinja2 1, markupsafe 1, mpmath 1, networkx 1, ninja 1, numpy 1, nvidia-cublas-cu12 1, nvidia-cuda-cupti-cu12 1, nvidia-cuda-nvrtc-cu12 1, nvidia-cuda-runtime-cu12 1, nvidia-cudnn-cu12 1, nvidia-cudnn-frontend 1, nvidia-cufft-cu12 1, nvidia-cufile-cu12 1, nvidia-curand-cu12 1, nvidia-cusolver-cu12 1, nvidia-cusparse-cu12 1, nvidia-cusparselt-cu12 1, nvidia-ml-py 1, nvidia-nccl-cu12 1, nvidia-nvjitlink-cu12 1, nvidia-nvshmem-cu12 1, nvidia-nvtx-cu12 1, packaging 1, pynvml 1, pytorch-triton 1, requests 1, setuptools 1, sympy 1, torch 1, typing-extensions 1, urllib3 1 2025-09-07T09:28:23.7985021Z #48 0.549 DEBUG marker environment resolution took 0.045s 2025-09-07T09:28:23.7985496Z #48 0.549 Resolved 42 packages in 47ms 2025-09-07T09:28:23.7986312Z #48 0.550 DEBUG Requirement already installed: nvidia-cublas-cu12==12.8.4.1 (from file:///dist/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T09:28:23.7987726Z #48 0.550 DEBUG Identified uncached distribution: flashinfer-python @ file:///workspace/wheels/flashinfer/flashinfer_python-0.2.14.post1-cp39-abi3-linux_x86_64.whl 2025-09-07T09:28:23.7989196Z #48 0.550 DEBUG Requirement already installed: nvidia-cufile-cu12==1.13.1.3 (from file:///dist/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:28:23.7990606Z #48 0.550 DEBUG Requirement already installed: nvidia-cusolver-cu12==11.7.3.90 (from file:///dist/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T09:28:23.7992550Z #48 0.550 DEBUG Requirement already installed: pytorch-triton==3.4.0+gitf7888497 (from file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:28:23.7993713Z #48 0.550 DEBUG Requirement already installed: urllib3==2.5.0 2025-09-07T09:28:23.7994646Z #48 0.550 DEBUG Requirement already installed: nvidia-curand-cu12==10.3.9.90 (from file:///dist/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T09:28:23.7995592Z #48 0.550 DEBUG Requirement already installed: packaging==25.0 2025-09-07T09:28:23.7996125Z #48 0.550 DEBUG Identified uncached distribution: pynvml==13.0.1 2025-09-07T09:28:23.7997191Z #48 0.550 DEBUG Requirement already installed: nvidia-cuda-cupti-cu12==12.8.90 (from file:///dist/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:28:23.7998683Z #48 0.550 DEBUG Requirement already installed: nvidia-nccl-cu12==2.27.5 (from file:///dist/nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:28:23.7999930Z #48 0.550 DEBUG Requirement already installed: setuptools==78.1.0 (from file:///dist/setuptools-78.1.0-py3-none-any.whl) 2025-09-07T09:28:23.8001139Z #48 0.550 DEBUG Requirement already installed: markupsafe==3.0.2 (from file:///dist/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl) 2025-09-07T09:28:23.8002074Z #48 0.550 DEBUG Requirement already installed: einops==0.8.1 2025-09-07T09:28:23.8002575Z #48 0.550 DEBUG Requirement already installed: ninja==1.13.0 2025-09-07T09:28:23.8003117Z #48 0.550 DEBUG Requirement already installed: numpy==2.2.6 2025-09-07T09:28:23.8004282Z #48 0.550 DEBUG Requirement already installed: nvidia-cuda-runtime-cu12==12.8.90 (from file:///dist/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:28:23.8005332Z #48 0.550 DEBUG Requirement already installed: requests==2.32.5 2025-09-07T09:28:23.8006314Z #48 0.550 DEBUG Requirement already installed: nvidia-nvshmem-cu12==3.3.20 (from file:///dist/nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:28:23.8007603Z #48 0.550 DEBUG Requirement already installed: typing-extensions==4.14.1 (from file:///dist/typing_extensions-4.14.1-py3-none-any.whl) 2025-09-07T09:28:23.8008615Z #48 0.550 DEBUG Requirement already installed: mpmath==1.3.0 (from file:///dist/mpmath-1.3.0-py3-none-any.whl) 2025-09-07T09:28:23.8009501Z #48 0.550 DEBUG Requirement already installed: sympy==1.14.0 (from file:///dist/sympy-1.14.0-py3-none-any.whl) 2025-09-07T09:28:23.8010623Z #48 0.550 DEBUG Requirement already installed: nvidia-cusparselt-cu12==0.7.1 (from file:///dist/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl) 2025-09-07T09:28:23.8012058Z #48 0.550 DEBUG Requirement already installed: nvidia-cusparse-cu12==12.5.8.93 (from file:///dist/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:28:23.8013309Z #48 0.550 DEBUG Requirement already installed: idna==3.10 2025-09-07T09:28:23.8014351Z #48 0.550 DEBUG Requirement already installed: nvidia-cufft-cu12==11.3.3.83 (from file:///dist/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:28:23.8015356Z #48 0.550 DEBUG Requirement already installed: certifi==2025.8.3 2025-09-07T09:28:23.8015924Z #48 0.550 DEBUG Identified uncached distribution: cuda-python==12.9.0 2025-09-07T09:28:23.8016997Z #48 0.550 DEBUG Requirement already installed: nvidia-cuda-nvrtc-cu12==12.8.93 (from file:///dist/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T09:28:23.8018450Z #48 0.550 DEBUG Requirement already installed: nvidia-cudnn-cu12==9.10.2.21 (from file:///dist/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl) 2025-09-07T09:28:23.8019574Z #48 0.550 DEBUG Requirement already installed: jinja2==3.1.6 (from file:///dist/jinja2-3.1.6-py3-none-any.whl) 2025-09-07T09:28:23.8020378Z #48 0.550 DEBUG Identified uncached distribution: cuda-bindings==12.9.2 2025-09-07T09:28:23.8021059Z #48 0.550 DEBUG Identified uncached distribution: cuda-pathfinder==1.2.1 2025-09-07T09:28:23.8021846Z #48 0.550 DEBUG Requirement already installed: filelock==3.19.1 (from file:///dist/filelock-3.19.1-py3-none-any.whl) 2025-09-07T09:28:23.8023045Z #48 0.550 DEBUG Requirement already installed: torch==2.9.0.dev20250906+cu128 (from file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T09:28:23.8024065Z #48 0.550 DEBUG Identified uncached distribution: nvidia-ml-py==13.580.65 2025-09-07T09:28:23.8024941Z #48 0.550 DEBUG Requirement already installed: networkx==3.5 (from file:///dist/networkx-3.5-py3-none-any.whl) 2025-09-07T09:28:23.8025966Z #48 0.550 DEBUG Requirement already installed: fsspec==2025.7.0 (from file:///dist/fsspec-2025.7.0-py3-none-any.whl) 2025-09-07T09:28:23.8027097Z #48 0.550 DEBUG Requirement already installed: nvidia-nvtx-cu12==12.8.90 (from file:///dist/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl) 2025-09-07T09:28:23.8028108Z #48 0.550 DEBUG Identified uncached distribution: nvidia-cudnn-frontend==1.14.1 2025-09-07T09:28:23.8028726Z #48 0.550 DEBUG Requirement already installed: charset-normalizer==3.4.3 2025-09-07T09:28:23.8029747Z #48 0.550 DEBUG Requirement already installed: nvidia-nvjitlink-cu12==12.8.93 (from file:///dist/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl) 2025-09-07T09:28:23.8030730Z #48 0.550 DEBUG Unnecessary package: pyyaml==6.0.2 2025-09-07T09:28:23.8031170Z #48 0.550 DEBUG Unnecessary package: aiohappyeyeballs==2.6.1 2025-09-07T09:28:23.8031628Z #48 0.550 DEBUG Unnecessary package: aiohttp==3.12.15 2025-09-07T09:28:23.8032056Z #48 0.550 DEBUG Unnecessary package: aiosignal==1.4.0 2025-09-07T09:28:23.8032492Z #48 0.550 DEBUG Unnecessary package: annotated-types==0.7.0 2025-09-07T09:28:23.8032934Z #48 0.550 DEBUG Unnecessary package: anyio==4.10.0 2025-09-07T09:28:23.8033329Z #48 0.550 DEBUG Unnecessary package: astor==0.8.1 2025-09-07T09:28:23.8033740Z #48 0.550 DEBUG Unnecessary package: attrs==25.3.0 2025-09-07T09:28:23.8034135Z #48 0.550 DEBUG Unnecessary package: blake3==1.0.5 2025-09-07T09:28:23.8034704Z #48 0.550 DEBUG Unnecessary package: build==1.3.0 2025-09-07T09:28:23.8035131Z #48 0.550 DEBUG Unnecessary package: cachetools==6.2.0 2025-09-07T09:28:23.8035547Z #48 0.550 DEBUG Unnecessary package: cbor2==5.7.0 2025-09-07T09:28:23.8035950Z #48 0.550 DEBUG Unnecessary package: cffi==1.17.1 2025-09-07T09:28:23.8036343Z #48 0.550 DEBUG Unnecessary package: click==8.2.1 2025-09-07T09:28:23.8036771Z #48 0.550 DEBUG Unnecessary package: cloudpickle==3.1.1 2025-09-07T09:28:23.8037247Z #48 0.550 DEBUG Unnecessary package: compressed-tensors==0.11.0 2025-09-07T09:28:23.8037743Z #48 0.550 DEBUG Unnecessary package: cupy-cuda12x==13.6.0 2025-09-07T09:28:23.8038171Z #48 0.550 DEBUG Unnecessary package: depyf==0.19.0 2025-09-07T09:28:23.8038592Z #48 0.550 DEBUG Unnecessary package: dill==0.4.0 2025-09-07T09:28:23.8039013Z #48 0.550 DEBUG Unnecessary package: diskcache==5.6.3 2025-09-07T09:28:23.8039460Z #48 0.550 DEBUG Unnecessary package: distro==1.9.0 2025-09-07T09:28:23.8039888Z #48 0.550 DEBUG Unnecessary package: dnspython==2.7.0 2025-09-07T09:28:23.8040337Z #48 0.550 DEBUG Unnecessary package: email-validator==2.3.0 2025-09-07T09:28:23.8040797Z #48 0.550 DEBUG Unnecessary package: fastapi==0.116.1 2025-09-07T09:28:23.8041231Z #48 0.550 DEBUG Unnecessary package: fastapi-cli==0.0.10 2025-09-07T09:28:23.8041710Z #48 0.550 DEBUG Unnecessary package: fastapi-cloud-cli==0.1.5 2025-09-07T09:28:23.8042433Z #48 0.550 DEBUG Unnecessary package: fastrlock==0.8.3 2025-09-07T09:28:23.8042858Z #48 0.550 DEBUG Unnecessary package: frozendict==2.4.6 2025-09-07T09:28:23.8043304Z #48 0.550 DEBUG Unnecessary package: frozenlist==1.7.0 2025-09-07T09:28:23.8043717Z #48 0.550 DEBUG Unnecessary package: gguf==0.17.1 2025-09-07T09:28:23.8044120Z #48 0.550 DEBUG Unnecessary package: h11==0.16.0 2025-09-07T09:28:23.8044558Z #48 0.550 DEBUG Unnecessary package: hf-xet==1.1.9 2025-09-07T09:28:23.8045019Z #48 0.550 DEBUG Unnecessary package: httpcore==1.0.9 2025-09-07T09:28:23.8045440Z #48 0.550 DEBUG Unnecessary package: httptools==0.6.4 2025-09-07T09:28:23.8045868Z #48 0.550 DEBUG Unnecessary package: httpx==0.28.1 2025-09-07T09:28:23.8046323Z #48 0.550 DEBUG Unnecessary package: huggingface-hub==0.34.4 2025-09-07T09:28:23.8046787Z #48 0.550 DEBUG Unnecessary package: interegular==0.3.3 2025-09-07T09:28:23.8047235Z #48 0.550 DEBUG Unnecessary package: jiter==0.10.0 2025-09-07T09:28:23.8047655Z #48 0.550 DEBUG Unnecessary package: jsonschema==4.25.1 2025-09-07T09:28:23.8048184Z #48 0.550 DEBUG Unnecessary package: jsonschema-specifications==2025.4.1 2025-09-07T09:28:23.8048680Z #48 0.550 DEBUG Unnecessary package: lark==1.2.2 2025-09-07T09:28:23.8049107Z #48 0.550 DEBUG Unnecessary package: llguidance==0.7.30 2025-09-07T09:28:23.8049546Z #48 0.550 DEBUG Unnecessary package: llvmlite==0.44.0 2025-09-07T09:28:23.8050016Z #48 0.550 DEBUG Unnecessary package: lm-format-enforcer==0.11.3 2025-09-07T09:28:23.8050523Z #48 0.550 DEBUG Unnecessary package: markdown-it-py==4.0.0 2025-09-07T09:28:23.8050992Z #48 0.550 DEBUG Unnecessary package: mdurl==0.1.2 2025-09-07T09:28:23.8051433Z #48 0.550 DEBUG Unnecessary package: mistral-common==1.8.4 2025-09-07T09:28:23.8051870Z #48 0.550 DEBUG Unnecessary package: msgpack==1.1.1 2025-09-07T09:28:23.8052300Z #48 0.550 DEBUG Unnecessary package: msgspec==0.19.0 2025-09-07T09:28:23.8053011Z #48 0.550 DEBUG Unnecessary package: multidict==6.6.4 2025-09-07T09:28:23.8053475Z #48 0.550 DEBUG Unnecessary package: numba==0.61.2 2025-09-07T09:28:23.8053918Z #48 0.550 DEBUG Unnecessary package: openai==1.106.1 2025-09-07T09:28:23.8054372Z #48 0.550 DEBUG Unnecessary package: openai-harmony==0.0.4 2025-09-07T09:28:23.8054925Z #48 0.550 DEBUG Unnecessary package: opencv-python-headless==4.12.0.88 2025-09-07T09:28:23.8055450Z #48 0.550 DEBUG Unnecessary package: opt-einsum==3.4.0 2025-09-07T09:28:23.8055926Z #48 0.550 DEBUG Unnecessary package: outlines-core==0.2.10 2025-09-07T09:28:23.8056468Z #48 0.550 DEBUG Unnecessary package: partial-json-parser==0.2.1.1.post6 2025-09-07T09:28:23.8057387Z #48 0.550 DEBUG Unnecessary package: pillow==11.3.0 (from file:///dist/pillow-11.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl) 2025-09-07T09:28:23.8058216Z #48 0.550 DEBUG Preserving seed package: pip==25.2 2025-09-07T09:28:23.8058695Z #48 0.550 DEBUG Unnecessary package: prometheus-client==0.22.1 2025-09-07T09:28:23.8059311Z #48 0.550 DEBUG Unnecessary package: prometheus-fastapi-instrumentator==7.1.0 2025-09-07T09:28:23.8059883Z #48 0.550 DEBUG Unnecessary package: propcache==0.3.2 2025-09-07T09:28:23.8060338Z #48 0.550 DEBUG Unnecessary package: protobuf==6.32.0 2025-09-07T09:28:23.8060768Z #48 0.550 DEBUG Unnecessary package: psutil==7.0.0 2025-09-07T09:28:23.8061219Z #48 0.550 DEBUG Unnecessary package: py-cpuinfo==9.0.0 2025-09-07T09:28:23.8061679Z #48 0.550 DEBUG Unnecessary package: pybase64==1.4.2 2025-09-07T09:28:23.8062116Z #48 0.550 DEBUG Unnecessary package: pycountry==24.6.1 2025-09-07T09:28:23.8062613Z #48 0.550 DEBUG Unnecessary package: pycparser==2.22 2025-09-07T09:28:23.8063052Z #48 0.550 DEBUG Unnecessary package: pydantic==2.11.7 2025-09-07T09:28:23.8063530Z #48 0.550 DEBUG Unnecessary package: pydantic-core==2.33.2 2025-09-07T09:28:23.8064046Z #48 0.550 DEBUG Unnecessary package: pydantic-extra-types==2.10.5 2025-09-07T09:28:23.8064555Z #48 0.550 DEBUG Unnecessary package: pygments==2.19.2 2025-09-07T09:28:23.8065139Z #48 0.550 DEBUG Unnecessary package: pyproject-hooks==1.2.0 2025-09-07T09:28:23.8065605Z #48 0.550 DEBUG Unnecessary package: python-dotenv==1.1.1 2025-09-07T09:28:23.8066098Z #48 0.550 DEBUG Unnecessary package: python-json-logger==3.3.0 2025-09-07T09:28:23.8066589Z #48 0.550 DEBUG Unnecessary package: python-multipart==0.0.20 2025-09-07T09:28:23.8067049Z #48 0.550 DEBUG Unnecessary package: pyzmq==27.0.2 2025-09-07T09:28:23.8067482Z #48 0.550 DEBUG Unnecessary package: ray==2.49.1 2025-09-07T09:28:23.8067908Z #48 0.550 DEBUG Unnecessary package: referencing==0.36.2 2025-09-07T09:28:23.8068372Z #48 0.550 DEBUG Unnecessary package: regex==2025.9.1 2025-09-07T09:28:23.8068793Z #48 0.550 DEBUG Unnecessary package: rich==14.1.0 2025-09-07T09:28:23.8069227Z #48 0.550 DEBUG Unnecessary package: rich-toolkit==0.15.1 2025-09-07T09:28:23.8069655Z #48 0.550 DEBUG Unnecessary package: rignore==0.6.4 2025-09-07T09:28:23.8070087Z #48 0.550 DEBUG Unnecessary package: rpds-py==0.27.1 2025-09-07T09:28:23.8070519Z #48 0.550 DEBUG Unnecessary package: safetensors==0.6.2 2025-09-07T09:28:23.8070960Z #48 0.550 DEBUG Unnecessary package: scipy==1.16.1 2025-09-07T09:28:23.8071386Z #48 0.550 DEBUG Unnecessary package: sentencepiece==0.2.1 2025-09-07T09:28:23.8071848Z #48 0.550 DEBUG Unnecessary package: sentry-sdk==2.37.0 2025-09-07T09:28:23.8072304Z #48 0.550 DEBUG Unnecessary package: setproctitle==1.3.7 2025-09-07T09:28:23.8072751Z #48 0.550 DEBUG Unnecessary package: shellingham==1.5.4 2025-09-07T09:28:23.8073182Z #48 0.550 DEBUG Unnecessary package: six==1.17.0 2025-09-07T09:28:23.8073586Z #48 0.550 DEBUG Unnecessary package: sniffio==1.3.1 2025-09-07T09:28:23.8074016Z #48 0.550 DEBUG Unnecessary package: soundfile==0.13.1 2025-09-07T09:28:23.8074440Z #48 0.550 DEBUG Unnecessary package: soxr==0.5.0.post1 2025-09-07T09:28:23.8074874Z #48 0.550 DEBUG Unnecessary package: starlette==0.47.3 2025-09-07T09:28:23.8075296Z #48 0.550 DEBUG Unnecessary package: tiktoken==0.11.0 2025-09-07T09:28:23.8075766Z #48 0.550 DEBUG Unnecessary package: tokenizers==0.22.0 2025-09-07T09:28:23.8076693Z #48 0.550 DEBUG Unnecessary package: torchaudio==2.8.0.dev20250906+cu128 (from file:///dist/torchaudio-2.8.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T09:28:23.8078095Z #48 0.550 DEBUG Unnecessary package: torchvision==0.24.0.dev20250906+cu128 (from file:///dist/torchvision-0.24.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl) 2025-09-07T09:28:23.8079020Z #48 0.550 DEBUG Unnecessary package: tqdm==4.67.1 2025-09-07T09:28:23.8079449Z #48 0.550 DEBUG Unnecessary package: transformers==4.56.1 2025-09-07T09:28:23.8079904Z #48 0.550 DEBUG Unnecessary package: triton==3.4.0 2025-09-07T09:28:23.8080328Z #48 0.550 DEBUG Unnecessary package: typer==0.17.4 2025-09-07T09:28:23.8080778Z #48 0.550 DEBUG Unnecessary package: typing-inspection==0.4.1 2025-09-07T09:28:23.8081242Z #48 0.550 DEBUG Preserving seed package: uv==0.8.4 2025-09-07T09:28:23.8081654Z #48 0.550 DEBUG Unnecessary package: uvicorn==0.35.0 2025-09-07T09:28:23.8082079Z #48 0.550 DEBUG Unnecessary package: uvloop==0.21.0 2025-09-07T09:28:23.8083018Z #48 0.550 DEBUG Unnecessary package: vllm==0.10.2rc2.dev125+g4172235ab.d20250907 (from file:///wheels/vllm/vllm-0.10.2rc2.dev125+g4172235ab.d20250907-cp38-abi3-linux_x86_64.whl) 2025-09-07T09:28:23.8083976Z #48 0.550 DEBUG Unnecessary package: watchfiles==1.1.0 2025-09-07T09:28:23.8084428Z #48 0.550 DEBUG Unnecessary package: websockets==15.0.1 2025-09-07T09:28:23.8084853Z #48 0.550 DEBUG Unnecessary package: wheel==0.45.1 2025-09-07T09:28:23.8085813Z #48 0.550 DEBUG Unnecessary package: xformers==0.0.33+5d4b92a5.d20250907 (from file:///wheels/xformers/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl) 2025-09-07T09:28:23.8086723Z #48 0.550 DEBUG Unnecessary package: xgrammar==0.1.23 2025-09-07T09:28:23.8087146Z #48 0.550 DEBUG Unnecessary package: yarl==1.20.1 2025-09-07T09:28:23.8088442Z #48 0.550 DEBUG No cache entry for: https://files.pythonhosted.org/packages/26/15/3dbe02186dc0daaa8410aa1c1c368d36967b88035ce1cea663e9ba11312a/cuda_bindings-12.9.2-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:28:23.8090622Z #48 0.550 DEBUG No cache entry for: https://files.pythonhosted.org/packages/b7/b8/5f812452c653447b4c09fec3cf0c5192abab1ce18358fcfab16a70113cfa/nvidia_cudnn_frontend-1.14.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:28:23.8093087Z #48 0.550 DEBUG No cache entry for: https://files.pythonhosted.org/packages/f9/96/88a5cb161c61cab2ee65b5aa61e612901fbcb1660024f0ccb26fcb02a17c/nvidia_ml_py-13.580.65-py3-none-any.whl 2025-09-07T09:28:23.8095016Z #48 0.550 DEBUG No cache entry for: https://files.pythonhosted.org/packages/d7/4a/cac76c174bb439a0c46c9a4413fcbea5c6cabfb01879f7bbdb9fdfaed76c/pynvml-13.0.1-py3-none-any.whl 2025-09-07T09:28:23.8096767Z #48 0.550 DEBUG No cache entry for: https://files.pythonhosted.org/packages/22/54/6231878f6fc490f222c87190ce12196b67b7700b30818882a87f478e4944/cuda_pathfinder-1.2.1-py3-none-any.whl 2025-09-07T09:28:23.8098552Z #48 0.550 DEBUG No cache entry for: https://files.pythonhosted.org/packages/24/3c/4475aebeaab9651f2e61000fbe76f91a476d371dbfbf0a1cf46e689af253/cuda_python-12.9.0-py3-none-any.whl 2025-09-07T09:28:23.8099641Z #48 0.554 Downloading cuda-bindings (11.9MiB) 2025-09-07T09:28:23.8100068Z #48 0.554 Downloading nvidia-cudnn-frontend (1.7MiB) 2025-09-07T09:28:23.8100502Z #48 0.664 Downloading nvidia-cudnn-frontend 2025-09-07T09:28:23.9812746Z #48 0.749 Downloading cuda-bindings 2025-09-07T09:28:23.9813337Z #48 0.749 Prepared 7 packages in 198ms 2025-09-07T09:28:24.1702826Z #48 1.088 Installed 7 packages in 339ms 2025-09-07T09:28:24.1703253Z #48 1.088 + cuda-bindings==12.9.2 2025-09-07T09:28:24.1703600Z #48 1.088 + cuda-pathfinder==1.2.1 2025-09-07T09:28:24.1703995Z #48 1.088 + cuda-python==12.9.0 2025-09-07T09:28:24.3214106Z #48 1.088 + flashinfer-python==0.2.14.post1 (from file:///workspace/wheels/flashinfer/flashinfer_python-0.2.14.post1-cp39-abi3-linux_x86_64.whl) 2025-09-07T09:28:24.3216698Z #48 1.088 + nvidia-cudnn-frontend==1.14.1 2025-09-07T09:28:24.3217633Z #48 1.088 + nvidia-ml-py==13.580.65 2025-09-07T09:28:24.3218467Z #48 1.088 + pynvml==13.0.1 2025-09-07T09:28:24.3219426Z #48 1.089 DEBUG Released lock at `/tmp/uv-281d6a3886c08524.lock` 2025-09-07T09:28:33.8234127Z #48 DONE 10.7s 2025-09-07T09:28:33.9766986Z 2025-09-07T09:28:33.9768014Z #49 [vllm-base 17/18] RUN pip freeze | grep -E 'torch|xformers|vllm|flashinfer' 2025-09-07T09:28:34.6942779Z #49 0.869 flashinfer-python @ file:///workspace/wheels/flashinfer/flashinfer_python-0.2.14.post1-cp39-abi3-linux_x86_64.whl 2025-09-07T09:28:34.6944081Z #49 0.869 pytorch-triton @ file:///dist/pytorch_triton-3.4.0+gitf7888497-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:28:34.6945192Z #49 0.869 torch @ file:///dist/torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T09:28:34.6946022Z #49 0.869 torchaudio @ file:///dist/torchaudio-2.8.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T09:28:34.6946955Z #49 0.869 torchvision @ file:///dist/torchvision-0.24.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl 2025-09-07T09:28:34.6947849Z #49 0.869 vllm @ file:///wheels/vllm/vllm-0.10.2rc2.dev125+g4172235ab.d20250907-cp38-abi3-linux_x86_64.whl 2025-09-07T09:28:34.6948703Z #49 0.869 xformers @ file:///wheels/xformers/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl 2025-09-07T09:28:34.8654384Z #49 DONE 0.9s 2025-09-07T09:28:34.8654853Z 2025-09-07T09:28:34.8656502Z #50 [vllm-base 18/18] RUN uv pip freeze | grep -i '^torch\|^torchvision\|^torchaudio\|^xformers\|^vllm\|^flashinfer' > build_summary.txt 2025-09-07T09:28:35.5094072Z #50 0.795 Using Python 3.12.11 environment at: /opt/python/cp312-cp312 2025-09-07T09:28:35.7334325Z #50 DONE 0.8s 2025-09-07T09:28:35.7334526Z 2025-09-07T09:28:35.7334941Z #51 [export-wheels 3/4] COPY --from=vllm-base /workspace/build_summary.txt /wheels/build_summary.txt 2025-09-07T09:28:35.7335535Z #51 DONE 0.0s 2025-09-07T09:28:36.0040240Z 2025-09-07T09:28:36.0040901Z #52 [export-wheels 4/4] COPY --from=vllm-base /workspace/wheels/flashinfer /wheels/flashinfer-python 2025-09-07T09:28:36.1431005Z #52 DONE 0.0s 2025-09-07T09:28:36.1431360Z 2025-09-07T09:28:36.1431510Z #53 exporting to client directory 2025-09-07T09:28:36.1431910Z #53 copying files 55.85MB 0.1s 2025-09-07T09:28:39.5062026Z #53 copying files 862.80MB 3.3s done 2025-09-07T09:28:45.9573541Z #53 DONE 9.9s 2025-09-07T09:28:46.0503715Z 2025-09-07 09:28:46,049 [INFO] cli.lib.core.vllm.vllm_build: Generate GH Summary ... 2025-09-07T09:28:46.0981970Z ##[group]Run set -eux 2025-09-07T09:28:46.0982326Z set -eux 2025-09-07T09:28:46.0982582Z  2025-09-07T09:28:46.0983127Z # Get these wheels ready, the vllm renaming logic is copied from its .buildkite/scripts/upload-wheels.sh 2025-09-07T09:28:46.0983853Z docker exec -t "${container_name}" bash -c " 2025-09-07T09:28:46.0984239Z  set -eux 2025-09-07T09:28:46.0984499Z  2025-09-07T09:28:46.0985080Z  nightly=\$(unzip -p torch-* '**/METADATA' | grep '^Version: ' | cut -d' ' -f2 | cut -d'.' -f4) 2025-09-07T09:28:46.0985704Z  2025-09-07T09:28:46.0985951Z  pushd externals/vllm/wheels 2025-09-07T09:28:46.0986359Z  for package in xformers flashinfer-python vllm; do 2025-09-07T09:28:46.0986772Z  pushd \$package 2025-09-07T09:28:46.0987112Z  auditwheel repair --plat \$PLATFORM *.whl \ 2025-09-07T09:28:46.0987715Z  --exclude libc10* --exclude libtorch* --exclude libcu* --exclude libnv* 2025-09-07T09:28:46.0988407Z  repair_wheel=\$(find wheelhouse -name *\${PLATFORM}*) 2025-09-07T09:28:46.0988858Z  repair_wheel=\$(basename \${repair_wheel}) 2025-09-07T09:28:46.0989210Z  popd 2025-09-07T09:28:46.0989428Z  2025-09-07T09:28:46.0989693Z  cp \${package}/wheelhouse/\${repair_wheel} . 2025-09-07T09:28:46.0990213Z  version=\$(unzip -p \$repair_wheel '**/METADATA' | grep '^Version: ' | cut -d' ' -f2) 2025-09-07T09:28:46.0990689Z  2025-09-07T09:28:46.0990913Z  if [[ \$package == vllm ]]; then 2025-09-07T09:28:46.0991311Z  new_wheel=\${repair_wheel/\$version/1.0.0.\$nightly} 2025-09-07T09:28:46.0991668Z  else 2025-09-07T09:28:46.0992263Z  major_version=\$(echo \$version | tr '.+' '.' | cut -d'.' -f1-3) 2025-09-07T09:28:46.0993028Z  new_wheel=\${repair_wheel/\$version/\$major_version.\$nightly} 2025-09-07T09:28:46.0993480Z  fi 2025-09-07T09:28:46.0993728Z  2025-09-07T09:28:46.0993983Z  mv -- \$repair_wheel \$new_wheel 2025-09-07T09:28:46.0994359Z  rm -rf \$package 2025-09-07T09:28:46.0994650Z  done 2025-09-07T09:28:46.0994902Z  popd 2025-09-07T09:28:46.0995135Z " 2025-09-07T09:28:46.0995365Z  2025-09-07T09:28:46.0995730Z docker exec -t "${container_name}" chown -R 1000:1000 /artifacts 2025-09-07T09:28:46.1006661Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:28:46.1007046Z env: 2025-09-07T09:28:46.1007245Z PY_VERS: 3.12 2025-09-07T09:28:46.1007561Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T09:28:46.1007940Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T09:28:46.1008235Z BUILD_DEVICE: cu128 2025-09-07T09:28:46.1008540Z PYTHON_EXECUTABLE: /opt/python/cp312-cp312/bin/python 2025-09-07T09:28:46.1009105Z container_name: fb05ea66628c1bb4d7cfe73a50663699628d01e7033a02d54776a0b921a636be 2025-09-07T09:28:46.1009587Z ##[endgroup] 2025-09-07T09:28:46.1040383Z + docker exec -t fb05ea66628c1bb4d7cfe73a50663699628d01e7033a02d54776a0b921a636be bash -c ' 2025-09-07T09:28:46.1040986Z set -eux 2025-09-07T09:28:46.1041132Z 2025-09-07T09:28:46.1041483Z nightly=$(unzip -p torch-* '\''**/METADATA'\'' | grep '\''^Version: '\'' | cut -d'\'' '\'' -f2 | cut -d'\''.'\'' -f4) 2025-09-07T09:28:46.1041953Z 2025-09-07T09:28:46.1042080Z pushd externals/vllm/wheels 2025-09-07T09:28:46.1042447Z for package in xformers flashinfer-python vllm; do 2025-09-07T09:28:46.1042904Z pushd $package 2025-09-07T09:28:46.1043479Z auditwheel repair --plat $PLATFORM *.whl --exclude libc10* --exclude libtorch* --exclude libcu* --exclude libnv* 2025-09-07T09:28:46.1044194Z repair_wheel=$(find wheelhouse -name *${PLATFORM}*) 2025-09-07T09:28:46.1044719Z repair_wheel=$(basename ${repair_wheel}) 2025-09-07T09:28:46.1045060Z popd 2025-09-07T09:28:46.1045315Z 2025-09-07T09:28:46.1045530Z cp ${package}/wheelhouse/${repair_wheel} . 2025-09-07T09:28:46.1046096Z version=$(unzip -p $repair_wheel '\''**/METADATA'\'' | grep '\''^Version: '\'' | cut -d'\'' '\'' -f2) 2025-09-07T09:28:46.1046554Z 2025-09-07T09:28:46.1046665Z if [[ $package == vllm ]]; then 2025-09-07T09:28:46.1047026Z new_wheel=${repair_wheel/$version/1.0.0.$nightly} 2025-09-07T09:28:46.1047395Z else 2025-09-07T09:28:46.1047767Z major_version=$(echo $version | tr '\''.+'\'' '\''.'\'' | cut -d'\''.'\'' -f1-3) 2025-09-07T09:28:46.1048323Z new_wheel=${repair_wheel/$version/$major_version.$nightly} 2025-09-07T09:28:46.1048729Z fi 2025-09-07T09:28:46.1048852Z 2025-09-07T09:28:46.1048962Z mv -- $repair_wheel $new_wheel 2025-09-07T09:28:46.1049284Z rm -rf $package 2025-09-07T09:28:46.1049518Z done 2025-09-07T09:28:46.1049734Z popd 2025-09-07T09:28:46.1049940Z ' 2025-09-07T09:28:46.2959711Z ++ unzip -p torch-2.9.0.dev20250906+cu128-cp312-cp312-manylinux_2_28_x86_64.whl '**/METADATA' 2025-09-07T09:28:46.2960339Z ++ grep '^Version: ' 2025-09-07T09:28:46.2960611Z ++ cut '-d ' -f2 2025-09-07T09:28:46.2960868Z ++ cut -d. -f4 2025-09-07T09:28:46.6991115Z + nightly=dev20250906+cu128 2025-09-07T09:28:46.6991488Z + pushd externals/vllm/wheels 2025-09-07T09:28:46.6991840Z /artifacts/externals/vllm/wheels /artifacts 2025-09-07T09:28:46.6992605Z + for package in xformers flashinfer-python vllm 2025-09-07T09:28:46.6993029Z + pushd xformers 2025-09-07T09:28:46.6993538Z /artifacts/externals/vllm/wheels/xformers /artifacts/externals/vllm/wheels /artifacts 2025-09-07T09:28:46.6994789Z + auditwheel repair --plat manylinux_2_28_x86_64 xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl --exclude 'libc10*' --exclude 'libtorch*' --exclude 'libcu*' --exclude 'libnv*' 2025-09-07T09:28:46.9845325Z INFO:auditwheel.main_repair:Repairing xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-linux_x86_64.whl 2025-09-07T09:28:52.0316694Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:28:52.0317284Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:28:52.0317733Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:28:52.0318245Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:28:52.0318750Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:28:52.0319192Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:28:52.3045200Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:28:52.3045661Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:28:52.3046085Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:28:52.3046534Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:28:52.3046958Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:28:52.3047397Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:28:52.5824168Z INFO:auditwheel.main_repair:Wheel is eligible for a higher priority tag. You requested manylinux_2_28_x86_64 but I have found this wheel is eligible for manylinux_2_27_x86_64. 2025-09-07T09:28:57.6262491Z INFO:auditwheel.wheeltools:Previous filename tags: linux_x86_64 2025-09-07T09:28:57.6263332Z INFO:auditwheel.wheeltools:New filename tags: manylinux_2_27_x86_64, manylinux_2_28_x86_64 2025-09-07T09:28:57.6264163Z INFO:auditwheel.wheeltools:Previous WHEEL info tags: cp39-abi3-linux_x86_64 2025-09-07T09:28:57.6265204Z INFO:auditwheel.wheeltools:New WHEEL info tags: cp39-abi3-manylinux_2_27_x86_64, cp39-abi3-manylinux_2_28_x86_64 2025-09-07T09:29:49.2587653Z INFO:auditwheel.main_repair: 2025-09-07T09:29:49.2588765Z Fixed-up wheel written to /artifacts/externals/vllm/wheels/xformers/wheelhouse/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:29:49.2818346Z ++ find wheelhouse -name '*manylinux_2_28_x86_64*' 2025-09-07T09:29:49.2852993Z + repair_wheel=wheelhouse/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:29:49.2856553Z ++ basename wheelhouse/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:29:49.2889564Z + repair_wheel=xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:29:49.2890219Z + popd 2025-09-07T09:29:49.2890492Z /artifacts/externals/vllm/wheels /artifacts 2025-09-07T09:29:49.2891232Z + cp xformers/wheelhouse/xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl . 2025-09-07T09:29:49.4323649Z ++ unzip -p xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl '**/METADATA' 2025-09-07T09:29:49.4324406Z ++ grep '^Version: ' 2025-09-07T09:29:49.4324668Z ++ cut '-d ' -f2 2025-09-07T09:29:49.4390009Z + version=0.0.33+5d4b92a5.d20250907 2025-09-07T09:29:49.4390792Z + [[ xformers == vllm ]] 2025-09-07T09:29:49.4393127Z ++ echo 0.0.33+5d4b92a5.d20250907 2025-09-07T09:29:49.4393937Z ++ tr .+ . 2025-09-07T09:29:49.4395382Z ++ cut -d. -f1-3 2025-09-07T09:29:49.4424387Z + major_version=0.0.33 2025-09-07T09:29:49.4425152Z + new_wheel=xformers-0.0.33.dev20250906+cu128-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:29:49.4426424Z + mv -- xformers-0.0.33+5d4b92a5.d20250907-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl xformers-0.0.33.dev20250906+cu128-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:29:49.4453357Z + rm -rf xformers 2025-09-07T09:29:49.4924461Z + for package in xformers flashinfer-python vllm 2025-09-07T09:29:49.4924900Z + pushd flashinfer-python 2025-09-07T09:29:49.4925502Z /artifacts/externals/vllm/wheels/flashinfer-python /artifacts/externals/vllm/wheels /artifacts 2025-09-07T09:29:49.4926739Z + auditwheel repair --plat manylinux_2_28_x86_64 flashinfer_python-0.2.14.post1-cp39-abi3-linux_x86_64.whl --exclude 'libc10*' --exclude 'libtorch*' --exclude 'libcu*' --exclude 'libnv*' 2025-09-07T09:29:49.6197569Z INFO:auditwheel.main_repair:Repairing flashinfer_python-0.2.14.post1-cp39-abi3-linux_x86_64.whl 2025-09-07T09:29:52.4843434Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:52.4843945Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:52.4844382Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:52.4844841Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:52.4845267Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:52.4845702Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:52.5556880Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:52.5557857Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:52.5558351Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:52.5558813Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:52.5559237Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:52.5559667Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:52.6215018Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:52.6215504Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:52.6215967Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:52.6216429Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:52.6216877Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:52.6217302Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:52.6894294Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:52.6894770Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:52.6895230Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:52.6895681Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:52.6896135Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:52.6896565Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:52.7581581Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:52.7582236Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:52.7582904Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:52.7583515Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:52.7584018Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:52.7584465Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:52.8233901Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:52.8234377Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:52.8234812Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:52.8235264Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:52.8235691Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:52.8236123Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:52.8896154Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:52.8896677Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:52.8897163Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:52.8897695Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:52.8898214Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:52.8898646Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:52.9603562Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:52.9604048Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:52.9604604Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:52.9605058Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:52.9605482Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:52.9605908Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:53.0293930Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:53.0294394Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:53.0294855Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:53.0295307Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:53.0295780Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:53.0296229Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:53.0975778Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:53.0976259Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:53.0976703Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:53.0977163Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:53.0977599Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:53.0978034Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:53.1615512Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:53.1615969Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:53.1616429Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:53.1616886Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:53.1617335Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:53.1617773Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:53.2394684Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:53.2395176Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:53.2395623Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:53.2396085Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:53.2396520Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:53.2396959Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:53.3071639Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:53.3072110Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:53.3072557Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:53.3072991Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:53.3073430Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:53.3073843Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:53.3753517Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:53.3754182Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:53.3754698Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:53.3755154Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:53.3755578Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:53.3756009Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:53.4433873Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:53.4434633Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:53.5167688Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:53.5168176Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:53.5168673Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:53.5169108Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:53.5169519Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:53.5169948Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:53.5170377Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:53.5170828Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:53.5171265Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:53.5171677Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:53.5892539Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:53.5893470Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:53.5894011Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:53.5894479Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:53.5894931Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:53.5895360Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:53.6615412Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:53.6615960Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:53.6616452Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:53.6616956Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:53.6617402Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:53.6617846Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:53.7335004Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:53.7335559Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:53.7336085Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:53.7336539Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:53.7337070Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:53.7337566Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:53.8020741Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:53.8021275Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:53.8021765Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:53.8022276Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:53.8022796Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:53.8023265Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:53.8728600Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:53.8729088Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:53.8729528Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:53.8729979Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:53.8730419Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:53.8730837Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:53.9516585Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:53.9517060Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:53.9517488Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:53.9517935Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:53.9518368Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:53.9518927Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:54.0218895Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:54.0219738Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:54.0220200Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:54.0220670Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:54.0221110Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:54.0221548Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:54.0881024Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:54.0881494Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:54.0881938Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:54.0882374Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:54.0882809Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:54.0883222Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:54.1621293Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:54.1621776Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:54.1622237Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:54.1622703Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:54.1623140Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:54.1623583Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:54.2340079Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:54.2340563Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:54.2341009Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:54.2341472Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:54.2341921Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:54.2342349Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:54.3079679Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:54.3080205Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:54.3080677Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:54.3081118Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:54.3081553Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:54.3081976Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:54.3774214Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:54.3774693Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:54.3775137Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:54.3775599Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:54.3776036Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:54.3776476Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:54.4451181Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:54.4451704Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:54.4452151Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:54.4452705Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:54.4453338Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:54.4453803Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:54.5242717Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:54.5243503Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:54.5244024Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:54.5244485Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:54.5244913Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:54.5245344Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:54.6010536Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:54.6011072Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:54.6011524Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:54.6011955Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:54.6012675Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:54.6013413Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:54.6774565Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:54.6775033Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:54.6775617Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:54.6776352Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:54.6776922Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:54.6777421Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:54.7475431Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:54.7475950Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:54.7476600Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:54.7477136Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:54.7477701Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:54.7478254Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:54.8155448Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:54.8156074Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:54.8156605Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:54.8157187Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:54.8157763Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:54.8158250Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:54.8849941Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:54.8850576Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:54.8851133Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:54.8851753Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:54.8852259Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:54.8853089Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:54.9512371Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:54.9513256Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:54.9513925Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:54.9514439Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:54.9515132Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:54.9515679Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:55.0151959Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:55.0152923Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:55.0153689Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:55.0154265Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:55.0154777Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:55.0155337Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:55.0995873Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:55.0996397Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:55.0997060Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:55.0997625Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:55.0998201Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:55.0998778Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:55.1654294Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:55.1654891Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:55.1655386Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:55.1656099Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:55.1656607Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:55.1657135Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:55.2279494Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:55.2280438Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:55.2280959Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:55.2281781Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:55.2282455Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:55.2282972Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:55.2951942Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:55.2952467Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:55.2953218Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:55.2953766Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:55.2954305Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:55.2954861Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:55.3591553Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:55.3592889Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:55.3593822Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:55.3594373Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:55.3594955Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:55.3595439Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:55.4394178Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:55.4394767Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:55.4395265Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:55.4395715Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:55.4396166Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:55.4396593Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:55.5161412Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:55.5161927Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:55.5162418Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:55.5162862Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:55.5163295Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:55.5163719Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:55.5800673Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:55.5801156Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:55.5801615Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:55.5802068Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:55.5802516Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:55.5802955Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:55.6493973Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:55.6494475Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:55.6494922Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:55.6495388Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:55.6495821Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:55.6496259Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:55.7164347Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:55.7165105Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:55.7165671Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:55.7166122Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:55.7166565Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:55.7166983Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:55.7942956Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:55.7943439Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:55.7943888Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:55.7944350Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:55.7944899Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:55.7945324Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:55.8616223Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:55.8616695Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:55.8617318Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:55.8617878Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:55.8618383Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:55.8618815Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:55.9254194Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:55.9254663Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:55.9255107Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:55.9255570Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:55.9256005Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:55.9256442Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:55.9935225Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:55.9935696Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:55.9936141Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:55.9936607Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:55.9937047Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:55.9937491Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:56.0595858Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:56.0596324Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:56.0596785Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:56.0597230Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:56.0597677Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:56.0598101Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:56.1300317Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:56.1300757Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:56.1301213Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:56.1301684Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:56.1302128Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:56.1302575Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:56.1993774Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:56.1994255Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:56.1994697Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:56.1995157Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:56.1995594Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:56.1996030Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:56.2634254Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:56.2635013Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:56.2635536Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:56.2635998Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:56.2636427Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:56.2636860Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:56.3335428Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:56.3335904Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:56.3336360Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:56.3336808Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:56.3337254Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:56.3337680Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:56.4047337Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:56.4048127Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:56.4048648Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:56.4049152Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:56.4049573Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:56.4050000Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:56.4724008Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:56.4724693Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:56.4725230Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:56.4725783Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:56.4726211Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:56.4726639Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:56.5431096Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:56.5431841Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:56.5432378Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:56.5432848Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:56.5433274Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:56.5433703Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:56.6090361Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:29:56.6090919Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:29:56.6091370Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:29:56.6091827Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:29:56.6092732Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:29:56.6093160Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:29:56.9237959Z INFO:auditwheel.main_repair:Wheel is eligible for a higher priority tag. You requested manylinux_2_28_x86_64 but I have found this wheel is eligible for manylinux_2_27_x86_64. 2025-09-07T09:29:59.7894827Z INFO:auditwheel.wheeltools:Previous filename tags: linux_x86_64 2025-09-07T09:29:59.7895584Z INFO:auditwheel.wheeltools:New filename tags: manylinux_2_27_x86_64, manylinux_2_28_x86_64 2025-09-07T09:29:59.7896359Z INFO:auditwheel.wheeltools:Previous WHEEL info tags: cp39-abi3-linux_x86_64 2025-09-07T09:29:59.7897224Z INFO:auditwheel.wheeltools:New WHEEL info tags: cp39-abi3-manylinux_2_27_x86_64, cp39-abi3-manylinux_2_28_x86_64 2025-09-07T09:30:29.6919691Z INFO:auditwheel.main_repair: 2025-09-07T09:30:29.6920837Z Fixed-up wheel written to /artifacts/externals/vllm/wheels/flashinfer-python/wheelhouse/flashinfer_python-0.2.14.post1-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:30:29.7251312Z ++ find wheelhouse -name '*manylinux_2_28_x86_64*' 2025-09-07T09:30:29.7318232Z + repair_wheel=wheelhouse/flashinfer_python-0.2.14.post1-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:30:29.7319282Z ++ basename wheelhouse/flashinfer_python-0.2.14.post1-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:30:29.7350351Z + repair_wheel=flashinfer_python-0.2.14.post1-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:30:29.7350983Z + popd 2025-09-07T09:30:29.7351262Z /artifacts/externals/vllm/wheels /artifacts 2025-09-07T09:30:29.7352038Z + cp flashinfer-python/wheelhouse/flashinfer_python-0.2.14.post1-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl . 2025-09-07T09:30:29.8122043Z ++ unzip -p flashinfer_python-0.2.14.post1-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl '**/METADATA' 2025-09-07T09:30:29.8122827Z ++ grep '^Version: ' 2025-09-07T09:30:29.8123127Z ++ cut '-d ' -f2 2025-09-07T09:30:30.0016654Z + version=0.2.14.post1 2025-09-07T09:30:30.0016998Z + [[ flashinfer-python == vllm ]] 2025-09-07T09:30:30.0021976Z ++ echo 0.2.14.post1 2025-09-07T09:30:30.0023296Z ++ tr .+ . 2025-09-07T09:30:30.0024526Z ++ cut -d. -f1-3 2025-09-07T09:30:30.0054109Z + major_version=0.2.14 2025-09-07T09:30:30.0054781Z + new_wheel=flashinfer_python-0.2.14.dev20250906+cu128-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:30:30.0056161Z + mv -- flashinfer_python-0.2.14.post1-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl flashinfer_python-0.2.14.dev20250906+cu128-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:30:30.0082238Z + rm -rf flashinfer-python 2025-09-07T09:30:30.0340003Z + for package in xformers flashinfer-python vllm 2025-09-07T09:30:30.0340603Z + pushd vllm 2025-09-07T09:30:30.0341199Z /artifacts/externals/vllm/wheels/vllm /artifacts/externals/vllm/wheels /artifacts 2025-09-07T09:30:30.0342560Z + auditwheel repair --plat manylinux_2_28_x86_64 vllm-0.10.2rc2.dev125+g4172235ab.d20250907-cp38-abi3-linux_x86_64.whl --exclude 'libc10*' --exclude 'libtorch*' --exclude 'libcu*' --exclude 'libnv*' 2025-09-07T09:30:30.1613837Z INFO:auditwheel.main_repair:Repairing vllm-0.10.2rc2.dev125+g4172235ab.d20250907-cp38-abi3-linux_x86_64.whl 2025-09-07T09:30:37.1634326Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:30:37.1634839Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:30:37.1635291Z INFO:auditwheel.lddtree:Excluding libcuda.so.1 2025-09-07T09:30:37.1635712Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:30:37.1636157Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:30:37.1636597Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:30:37.1636998Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:30:37.4480563Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:30:37.4481123Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:30:37.4481586Z INFO:auditwheel.lddtree:Excluding libcuda.so.1 2025-09-07T09:30:37.4482006Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:30:37.4482449Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:30:37.4482883Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:30:37.4483283Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:30:37.5174986Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:30:37.5175458Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:30:37.5175911Z INFO:auditwheel.lddtree:Excluding libcuda.so.1 2025-09-07T09:30:37.5176355Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:30:37.5176811Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:30:37.5177248Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:30:37.5177688Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:30:37.6205782Z INFO:auditwheel.lddtree:Excluding libcuda.so.1 2025-09-07T09:30:37.6206261Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:30:37.6206691Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:30:37.6207112Z INFO:auditwheel.lddtree:Excluding libnvrtc.so.12 2025-09-07T09:30:37.6207530Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:30:37.6207966Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:30:37.6208399Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:30:37.6952223Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:30:37.6952776Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:30:37.6953230Z INFO:auditwheel.lddtree:Excluding libcuda.so.1 2025-09-07T09:30:37.6953649Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:30:37.6954092Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:30:37.6954525Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:30:37.6954945Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:30:37.8240883Z INFO:auditwheel.lddtree:Excluding libtorch.so 2025-09-07T09:30:37.8241717Z INFO:auditwheel.lddtree:Excluding libcudart.so.12 2025-09-07T09:30:37.8242252Z INFO:auditwheel.lddtree:Excluding libcuda.so.1 2025-09-07T09:30:37.8242703Z INFO:auditwheel.lddtree:Excluding libtorch_cpu.so 2025-09-07T09:30:37.8243140Z INFO:auditwheel.lddtree:Excluding libtorch_cuda.so 2025-09-07T09:30:37.8243583Z INFO:auditwheel.lddtree:Excluding libc10_cuda.so 2025-09-07T09:30:37.8244001Z INFO:auditwheel.lddtree:Excluding libc10.so 2025-09-07T09:30:38.3254868Z INFO:auditwheel.main_repair:Wheel is eligible for a higher priority tag. You requested manylinux_2_28_x86_64 but I have found this wheel is eligible for manylinux_2_24_x86_64. 2025-09-07T09:30:45.3333267Z INFO:auditwheel.wheeltools:Previous filename tags: linux_x86_64 2025-09-07T09:30:45.3334876Z INFO:auditwheel.wheeltools:New filename tags: manylinux_2_24_x86_64, manylinux_2_28_x86_64 2025-09-07T09:30:45.3336031Z INFO:auditwheel.wheeltools:Previous WHEEL info tags: cp38-abi3-linux_x86_64 2025-09-07T09:30:45.3336968Z INFO:auditwheel.wheeltools:New WHEEL info tags: cp38-abi3-manylinux_2_24_x86_64, cp38-abi3-manylinux_2_28_x86_64 2025-09-07T09:31:59.3244175Z INFO:auditwheel.main_repair: 2025-09-07T09:31:59.3247900Z Fixed-up wheel written to /artifacts/externals/vllm/wheels/vllm/wheelhouse/vllm-0.10.2rc2.dev125+g4172235ab.d20250907-cp38-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:31:59.3489123Z ++ find wheelhouse -name '*manylinux_2_28_x86_64*' 2025-09-07T09:31:59.3518971Z + repair_wheel=wheelhouse/vllm-0.10.2rc2.dev125+g4172235ab.d20250907-cp38-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:31:59.3520112Z ++ basename wheelhouse/vllm-0.10.2rc2.dev125+g4172235ab.d20250907-cp38-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:31:59.3551177Z + repair_wheel=vllm-0.10.2rc2.dev125+g4172235ab.d20250907-cp38-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:31:59.3551894Z + popd 2025-09-07T09:31:59.3552178Z /artifacts/externals/vllm/wheels /artifacts 2025-09-07T09:31:59.3552924Z + cp vllm/wheelhouse/vllm-0.10.2rc2.dev125+g4172235ab.d20250907-cp38-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl . 2025-09-07T09:31:59.5542216Z ++ unzip -p vllm-0.10.2rc2.dev125+g4172235ab.d20250907-cp38-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl '**/METADATA' 2025-09-07T09:31:59.5543041Z ++ grep '^Version: ' 2025-09-07T09:31:59.5543313Z ++ cut '-d ' -f2 2025-09-07T09:31:59.6524549Z + version=0.10.2rc2.dev125+g4172235ab.d20250907 2025-09-07T09:31:59.6525265Z + [[ vllm == vllm ]] 2025-09-07T09:31:59.6525870Z + new_wheel=vllm-1.0.0.dev20250906+cu128-cp38-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:31:59.6527118Z + mv -- vllm-0.10.2rc2.dev125+g4172235ab.d20250907-cp38-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl vllm-1.0.0.dev20250906+cu128-cp38-abi3-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl 2025-09-07T09:31:59.6550938Z + rm -rf vllm 2025-09-07T09:31:59.7148143Z + popd 2025-09-07T09:31:59.7148425Z /artifacts 2025-09-07T09:31:59.7178184Z + docker exec -t fb05ea66628c1bb4d7cfe73a50663699628d01e7033a02d54776a0b921a636be chown -R 1000:1000 /artifacts 2025-09-07T09:31:59.8390607Z ##[group]Run actions/upload-artifact@50769540e7f4bd5e21e526ee35c689e35e0d6874 2025-09-07T09:31:59.8391126Z with: 2025-09-07T09:31:59.8391404Z name: vllm-wheel-cu128-3.12-manylinux_2_28_x86_64 2025-09-07T09:31:59.8391797Z if-no-files-found: error 2025-09-07T09:31:59.8392700Z path: /home/ec2-user/actions-runner/_work/_temp/artifacts/externals/vllm/wheels/*.whl 2025-09-07T09:31:59.8393315Z compression-level: 6 2025-09-07T09:31:59.8393602Z overwrite: false 2025-09-07T09:31:59.8393872Z include-hidden-files: false 2025-09-07T09:31:59.8394176Z env: 2025-09-07T09:31:59.8394390Z PY_VERS: 3.12 2025-09-07T09:31:59.8394733Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T09:31:59.8395167Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T09:31:59.8395496Z BUILD_DEVICE: cu128 2025-09-07T09:31:59.8395844Z PYTHON_EXECUTABLE: /opt/python/cp312-cp312/bin/python 2025-09-07T09:31:59.8396508Z container_name: fb05ea66628c1bb4d7cfe73a50663699628d01e7033a02d54776a0b921a636be 2025-09-07T09:31:59.8397039Z ##[endgroup] 2025-09-07T09:32:00.1000523Z With the provided path, there will be 3 files uploaded 2025-09-07T09:32:00.1005267Z Artifact name is valid! 2025-09-07T09:32:00.1007023Z Root directory input is valid! 2025-09-07T09:32:00.3194745Z Beginning upload of artifact content to blob storage 2025-09-07T09:32:00.9364790Z Uploaded bytes 8388608 2025-09-07T09:32:01.2444669Z Uploaded bytes 16777216 2025-09-07T09:32:01.6207063Z Uploaded bytes 25165824 2025-09-07T09:32:01.9645588Z Uploaded bytes 33554432 2025-09-07T09:32:02.3093302Z Uploaded bytes 41943040 2025-09-07T09:32:02.6791827Z Uploaded bytes 50331648 2025-09-07T09:32:03.0341231Z Uploaded bytes 58720256 2025-09-07T09:32:03.4387043Z Uploaded bytes 67108864 2025-09-07T09:32:03.7547851Z Uploaded bytes 75497472 2025-09-07T09:32:04.1224081Z Uploaded bytes 83886080 2025-09-07T09:32:04.4895187Z Uploaded bytes 92274688 2025-09-07T09:32:04.8211928Z Uploaded bytes 100663296 2025-09-07T09:32:05.1872645Z Uploaded bytes 109051904 2025-09-07T09:32:05.5791012Z Uploaded bytes 117440512 2025-09-07T09:32:05.9391816Z Uploaded bytes 125829120 2025-09-07T09:32:06.3016585Z Uploaded bytes 134217728 2025-09-07T09:32:06.6578905Z Uploaded bytes 142606336 2025-09-07T09:32:07.0132285Z Uploaded bytes 150994944 2025-09-07T09:32:07.4178377Z Uploaded bytes 159383552 2025-09-07T09:32:07.6974871Z Uploaded bytes 167772160 2025-09-07T09:32:08.0193805Z Uploaded bytes 176160768 2025-09-07T09:32:08.3369886Z Uploaded bytes 184549376 2025-09-07T09:32:08.6912612Z Uploaded bytes 192937984 2025-09-07T09:32:09.0112082Z Uploaded bytes 201326592 2025-09-07T09:32:09.3693175Z Uploaded bytes 209715200 2025-09-07T09:32:09.7415967Z Uploaded bytes 218103808 2025-09-07T09:32:10.0836539Z Uploaded bytes 226492416 2025-09-07T09:32:10.4096748Z Uploaded bytes 234881024 2025-09-07T09:32:10.7959936Z Uploaded bytes 243269632 2025-09-07T09:32:11.1412028Z Uploaded bytes 251658240 2025-09-07T09:32:11.4762495Z Uploaded bytes 260046848 2025-09-07T09:32:11.7937615Z Uploaded bytes 268435456 2025-09-07T09:32:12.1735631Z Uploaded bytes 276824064 2025-09-07T09:32:12.5041320Z Uploaded bytes 285212672 2025-09-07T09:32:12.8493913Z Uploaded bytes 293601280 2025-09-07T09:32:13.1893910Z Uploaded bytes 301989888 2025-09-07T09:32:13.5573088Z Uploaded bytes 310378496 2025-09-07T09:32:13.9240921Z Uploaded bytes 318767104 2025-09-07T09:32:14.2179228Z Uploaded bytes 327155712 2025-09-07T09:32:14.5718232Z Uploaded bytes 335544320 2025-09-07T09:32:14.9019313Z Uploaded bytes 343932928 2025-09-07T09:32:15.2523099Z Uploaded bytes 352321536 2025-09-07T09:32:15.5724229Z Uploaded bytes 360710144 2025-09-07T09:32:15.9127008Z Uploaded bytes 369098752 2025-09-07T09:32:16.2496517Z Uploaded bytes 377487360 2025-09-07T09:32:16.5744170Z Uploaded bytes 385875968 2025-09-07T09:32:16.9740740Z Uploaded bytes 394264576 2025-09-07T09:32:17.2651516Z Uploaded bytes 402653184 2025-09-07T09:32:17.6094213Z Uploaded bytes 411041792 2025-09-07T09:32:17.9643747Z Uploaded bytes 419430400 2025-09-07T09:32:18.3237293Z Uploaded bytes 427819008 2025-09-07T09:32:18.7045229Z Uploaded bytes 436207616 2025-09-07T09:32:19.0499754Z Uploaded bytes 444596224 2025-09-07T09:32:19.4115770Z Uploaded bytes 452984832 2025-09-07T09:32:19.7700911Z Uploaded bytes 461373440 2025-09-07T09:32:20.1555100Z Uploaded bytes 469762048 2025-09-07T09:32:20.5250594Z Uploaded bytes 478150656 2025-09-07T09:32:20.8921611Z Uploaded bytes 486539264 2025-09-07T09:32:21.4609344Z Uploaded bytes 494927872 2025-09-07T09:32:21.6128724Z Uploaded bytes 503316480 2025-09-07T09:32:21.9269765Z Uploaded bytes 511705088 2025-09-07T09:32:22.3014530Z Uploaded bytes 520093696 2025-09-07T09:32:22.6330452Z Uploaded bytes 528482304 2025-09-07T09:32:22.9864611Z Uploaded bytes 536870912 2025-09-07T09:32:23.3743850Z Uploaded bytes 545259520 2025-09-07T09:32:23.7117142Z Uploaded bytes 553648128 2025-09-07T09:32:24.0934175Z Uploaded bytes 562036736 2025-09-07T09:32:24.4527227Z Uploaded bytes 570425344 2025-09-07T09:32:24.7636818Z Uploaded bytes 578813952 2025-09-07T09:32:25.1306785Z Uploaded bytes 587202560 2025-09-07T09:32:25.5326043Z Uploaded bytes 595591168 2025-09-07T09:32:25.8131772Z Uploaded bytes 603979776 2025-09-07T09:32:26.1882096Z Uploaded bytes 612368384 2025-09-07T09:32:26.5176172Z Uploaded bytes 620756992 2025-09-07T09:32:26.9174411Z Uploaded bytes 629145600 2025-09-07T09:32:27.2098794Z Uploaded bytes 637534208 2025-09-07T09:32:27.5474391Z Uploaded bytes 645922816 2025-09-07T09:32:27.9278074Z Uploaded bytes 654311424 2025-09-07T09:32:28.2328259Z Uploaded bytes 662700032 2025-09-07T09:32:28.5738728Z Uploaded bytes 671088640 2025-09-07T09:32:28.9500676Z Uploaded bytes 679477248 2025-09-07T09:32:29.2967203Z Uploaded bytes 687865856 2025-09-07T09:32:29.6189492Z Uploaded bytes 696254464 2025-09-07T09:32:29.9732253Z Uploaded bytes 704643072 2025-09-07T09:32:30.3084570Z Uploaded bytes 713031680 2025-09-07T09:32:30.6284756Z Uploaded bytes 721420288 2025-09-07T09:32:31.0054738Z Uploaded bytes 729808896 2025-09-07T09:32:31.3374244Z Uploaded bytes 738197504 2025-09-07T09:32:31.7039080Z Uploaded bytes 746586112 2025-09-07T09:32:32.0637036Z Uploaded bytes 754974720 2025-09-07T09:32:32.4073915Z Uploaded bytes 763363328 2025-09-07T09:32:32.7664448Z Uploaded bytes 771751936 2025-09-07T09:32:33.1081701Z Uploaded bytes 780140544 2025-09-07T09:32:33.4474752Z Uploaded bytes 788529152 2025-09-07T09:32:33.7867931Z Uploaded bytes 796917760 2025-09-07T09:32:34.1306347Z Uploaded bytes 805306368 2025-09-07T09:32:34.4419939Z Uploaded bytes 813694976 2025-09-07T09:32:34.8101070Z Uploaded bytes 822083584 2025-09-07T09:32:35.1490035Z Uploaded bytes 830472192 2025-09-07T09:32:35.4316173Z Uploaded bytes 838860800 2025-09-07T09:32:35.7484135Z Uploaded bytes 847249408 2025-09-07T09:32:35.8509071Z Uploaded bytes 850532951 2025-09-07T09:32:35.8731856Z Finished uploading artifact content to blob storage! 2025-09-07T09:32:35.8735153Z SHA256 hash of uploaded artifact zip is a8e838f7c230200f4499056366b122af31449a3159ddfc19ed24e9ac3c9bd865 2025-09-07T09:32:35.8737037Z Finalizing artifact upload 2025-09-07T09:32:35.9702494Z Artifact vllm-wheel-cu128-3.12-manylinux_2_28_x86_64.zip successfully finalized. Artifact ID 3946802673 2025-09-07T09:32:35.9703558Z Artifact vllm-wheel-cu128-3.12-manylinux_2_28_x86_64 has been successfully uploaded! Final size is 850532951 bytes. Artifact ID is 3946802673 2025-09-07T09:32:35.9706552Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/17524754495/artifacts/3946802673 2025-09-07T09:32:35.9883568Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main 2025-09-07T09:32:35.9884042Z with: 2025-09-07T09:32:35.9884274Z env: 2025-09-07T09:32:35.9884492Z PY_VERS: 3.12 2025-09-07T09:32:35.9884832Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T09:32:35.9885265Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T09:32:35.9885584Z BUILD_DEVICE: cu128 2025-09-07T09:32:35.9885937Z PYTHON_EXECUTABLE: /opt/python/cp312-cp312/bin/python 2025-09-07T09:32:35.9886539Z container_name: fb05ea66628c1bb4d7cfe73a50663699628d01e7033a02d54776a0b921a636be 2025-09-07T09:32:35.9887075Z ##[endgroup] 2025-09-07T09:32:36.0055793Z ##[group]Run set -eou pipefail 2025-09-07T09:32:36.0056170Z set -eou pipefail 2025-09-07T09:32:36.0056460Z  2025-09-07T09:32:36.0056883Z echo "Holding runner for 2 hours until all ssh sessions have logged out" 2025-09-07T09:32:36.0057408Z for _ in $(seq 1440); do 2025-09-07T09:32:36.0057847Z  # Break if no ssh session exists anymore 2025-09-07T09:32:36.0058234Z  if [ "$(who)" = "" ]; then 2025-09-07T09:32:36.0058572Z  break 2025-09-07T09:32:36.0058823Z  fi 2025-09-07T09:32:36.0059076Z  echo "." 2025-09-07T09:32:36.0059336Z  sleep 5 2025-09-07T09:32:36.0059609Z done 2025-09-07T09:32:36.0069194Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:32:36.0069625Z env: 2025-09-07T09:32:36.0069848Z PY_VERS: 3.12 2025-09-07T09:32:36.0070195Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T09:32:36.0070607Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T09:32:36.0070929Z BUILD_DEVICE: cu128 2025-09-07T09:32:36.0071272Z PYTHON_EXECUTABLE: /opt/python/cp312-cp312/bin/python 2025-09-07T09:32:36.0071856Z container_name: fb05ea66628c1bb4d7cfe73a50663699628d01e7033a02d54776a0b921a636be 2025-09-07T09:32:36.0072381Z ##[endgroup] 2025-09-07T09:32:36.0107225Z Holding runner for 2 hours until all ssh sessions have logged out 2025-09-07T09:32:36.1228939Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-09-07T09:32:36.1229540Z # ignore expansion of "docker ps -q" since it could be empty 2025-09-07T09:32:36.1229982Z # shellcheck disable=SC2046 2025-09-07T09:32:36.1230425Z docker stop $(docker ps -q) || true 2025-09-07T09:32:36.1230794Z # Prune all of the docker images 2025-09-07T09:32:36.1231199Z docker system prune -af 2025-09-07T09:32:36.1238328Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:32:36.1238905Z env: 2025-09-07T09:32:36.1239129Z PY_VERS: 3.12 2025-09-07T09:32:36.1239469Z MANYLINUX_IMAGE: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T09:32:36.1239878Z PLATFORM: manylinux_2_28_x86_64 2025-09-07T09:32:36.1240416Z BUILD_DEVICE: cu128 2025-09-07T09:32:36.1240776Z PYTHON_EXECUTABLE: /opt/python/cp312-cp312/bin/python 2025-09-07T09:32:36.1241386Z container_name: fb05ea66628c1bb4d7cfe73a50663699628d01e7033a02d54776a0b921a636be 2025-09-07T09:32:36.1241927Z ##[endgroup] 2025-09-07T09:32:36.8906892Z fb05ea66628c 2025-09-07T09:32:38.0814023Z Deleted Containers: 2025-09-07T09:32:38.0814518Z fb05ea66628c1bb4d7cfe73a50663699628d01e7033a02d54776a0b921a636be 2025-09-07T09:32:38.0814918Z 2025-09-07T09:32:38.1516911Z Deleted Images: 2025-09-07T09:32:38.1517285Z untagged: pytorch/manylinux2_28-builder:cuda12.8 2025-09-07T09:32:38.1518060Z untagged: pytorch/manylinux2_28-builder@sha256:4d39d04594f7fb158015aedb72ea01cc710b592793e002a13682d23e2d50ce6d 2025-09-07T09:32:38.1518928Z deleted: sha256:ab9df097091ae31e8771009f86b9aefd962fd8430de0128a3a032f5a2bbc4e9e 2025-09-07T09:32:38.1519361Z 2025-09-07T09:32:49.4384435Z Deleted build cache objects: 2025-09-07T09:32:49.4384860Z betx9j7nh5nf75rxwwrztr4kp 2025-09-07T09:32:49.4385163Z kravvd58o57qsyksq7ne28btb 2025-09-07T09:32:49.4385575Z o08z98ik8wrre9mtazxvhxgbv 2025-09-07T09:32:49.4385852Z z8bsw854uo34v94viz4a72fnz 2025-09-07T09:32:49.4386478Z citi5fizj1uvfifk1sknvdvux 2025-09-07T09:32:49.4386791Z o4qmba33ugipj7gslb5hldpkf 2025-09-07T09:32:49.4387091Z df2ugks0lesmc5t2ehsiq7nne 2025-09-07T09:32:49.4387383Z xwnxnfdvr4m17xqtsbnxzpmxa 2025-09-07T09:32:49.4387685Z jhqtakpj068fptytuz14ds10k 2025-09-07T09:32:49.4387973Z qulsuswm4us46r9wrkjcpv8xy 2025-09-07T09:32:49.4388277Z l32016bhcpk8tjhjn5zp5ee2j 2025-09-07T09:32:49.4388573Z puj7q1acco8f838mzc1887fpc 2025-09-07T09:32:49.4388858Z zgh5bx91z2wuknvimz49kkfx3 2025-09-07T09:32:49.4389164Z hhzyhpact8454tmfoynqjpr8x 2025-09-07T09:32:49.4389449Z yq0z9t0si7ulvfswh2t8tia9f 2025-09-07T09:32:49.4389752Z z1ewa5pi45jj3nsp33ee7eeiv 2025-09-07T09:32:49.4390035Z jlofi9s05m0ij2hosdh2ddwov 2025-09-07T09:32:49.4390325Z 763dsy5128x3vt92ltbbdpzch 2025-09-07T09:32:49.4390601Z 09d4i7rzh4ts2ow3xttmfuxbs 2025-09-07T09:32:49.4390892Z y8ktol8qr143xykeubv6apco2 2025-09-07T09:32:49.4391176Z 1kkt7opszkl6tja6k12rgpvei 2025-09-07T09:32:49.4391470Z x3ak4nexrb9kkeb1zf7wfmav0 2025-09-07T09:32:49.4391761Z x0gzt0vdy6exdlinlev9jfaru 2025-09-07T09:32:49.4392283Z xiwcx6k37qknzyn1tx5kisdi7 2025-09-07T09:32:49.4392576Z rneem0wqr34k0zj4dcwyyrz2f 2025-09-07T09:32:49.4393112Z t3acna3hc87w90fmtrkyl0x3y 2025-09-07T09:32:49.4393407Z quxkjitwobl1q7vr5x4davvso 2025-09-07T09:32:49.4393718Z xk267f521tn02rut0qaf2p0w8 2025-09-07T09:32:49.4394010Z e1ab3hhttxlx6kmdb37g9kvrj 2025-09-07T09:32:49.4394326Z qesupi57i7elfjk33e9dhgwov 2025-09-07T09:32:49.4394617Z nr4cr1b0x8hlx2669cccazl3p 2025-09-07T09:32:49.4394926Z f7lb1ocaoqqsahivf6k051ra8 2025-09-07T09:32:49.4395231Z vafm6utvaim3ihb7g21lxgghx 2025-09-07T09:32:49.4395522Z waj8ts2ym5ldk1uz70d0oocjo 2025-09-07T09:32:49.4395821Z wg59nxvenygz60ikikt4d0chs 2025-09-07T09:32:49.4396113Z pg5n4a88ndq2774x1oxloprbg 2025-09-07T09:32:49.4396414Z j2tl4r0qcb06pfzsxxsmmksrz 2025-09-07T09:32:49.4396700Z 18bf7y3wiy74ch0ytsrtg2ik7 2025-09-07T09:32:49.4397208Z 5sdgb3u07brvgqkb5lvz7fwlm 2025-09-07T09:32:49.4397663Z 4qm8sv3szjunkcyk4efvb8cl8 2025-09-07T09:32:49.4398121Z 2a3dkfihyz2oatpeawtqcd1s5 2025-09-07T09:32:49.4398706Z a0htoaz1mag10xksp37ntybcz 2025-09-07T09:32:49.4399172Z xa71obdcsb17tt8t06id6ji2h 2025-09-07T09:32:49.4399477Z 485s2tcb6vcxx0mzok1tgzmsc 2025-09-07T09:32:49.4399765Z zgtkg0iju185rp1rewquhoy4j 2025-09-07T09:32:49.4400066Z tn8nd1d611p3rq70qgmyipgut 2025-09-07T09:32:49.4400356Z vl7q5u8mhzekqdcod3cn1qpdx 2025-09-07T09:32:49.4400800Z zu0lokesz14v7ry12wspimimw 2025-09-07T09:32:49.4401091Z wbncri7me4ft5qw6m3dfa15sv 2025-09-07T09:32:49.4401398Z oaqwroiubl93oetvicznh3y04 2025-09-07T09:32:49.4401779Z rk8v4rrm67d0aptel5tnyxzg0 2025-09-07T09:32:49.4402080Z on32syb0a7t80cca6cb1nojsy 2025-09-07T09:32:49.4402367Z b1tzdmlfdmuf2kxp741hbqxzb 2025-09-07T09:32:49.4402670Z 6y8wystw1dfpk73sjaho4mhb9 2025-09-07T09:32:49.4402957Z l4ickfih7es2k93jrio345sns 2025-09-07T09:32:49.4403260Z clu0irufv1wd0hefr5jblureu 2025-09-07T09:32:49.4403565Z mrwxhx3w4do0yw32yqnnqsldo 2025-09-07T09:32:49.4403957Z tn9usnku87v0zfnabjnwficlo 2025-09-07T09:32:49.4404257Z ugeg3dfyo5uw2mftht8b7bo76 2025-09-07T09:32:49.4404545Z padw9gc23y69ydyvm84lznok8 2025-09-07T09:32:49.4404848Z x7kg4nb169d5wrumeutdv4bze 2025-09-07T09:32:49.4405132Z 1jdzzwzr1nqdy241yjy9t7704 2025-09-07T09:32:49.4405432Z qxwd0waa6nai04vsp3cwlyayz 2025-09-07T09:32:49.4405716Z q920se9q54qcprplvu8kc3tn2 2025-09-07T09:32:49.4406015Z bf8tiuc2c4j1zc45oqj829cx0 2025-09-07T09:32:49.4406295Z 85kojafl9hk1j3e0lu4355bfk 2025-09-07T09:32:49.4406594Z 4kkfepbaoiixjd87xg7o5p0e3 2025-09-07T09:32:49.4406878Z ns4p4cbkgl6q5ou3ocdmckn9w 2025-09-07T09:32:49.4407171Z 7f7dc86i0jo25yonwwfeux8qh 2025-09-07T09:32:49.4407458Z p4mmq8ifi5nzg6or2bi5bgav1 2025-09-07T09:32:49.4407770Z 3gx5qhdw3ogjptlryrs391a7d 2025-09-07T09:32:49.4408053Z v9iva25b8kzd29dpjxhtvnbh9 2025-09-07T09:32:49.4408349Z zm3utv4usrfao6zmhgto1i2j0 2025-09-07T09:32:49.4408633Z igovgl4i114we15v4z66yyoph 2025-09-07T09:32:49.4408929Z olokxf5yn77xx2386oxz9eo7h 2025-09-07T09:32:49.4409315Z iwm5ntsf21p0s0go15oqt26fe 2025-09-07T09:32:49.4409603Z ikltqmgiyg9pr9bu85d80u2a0 2025-09-07T09:32:49.4409883Z osg5uvszmaytqubp49fg0lxnt 2025-09-07T09:32:49.4410421Z xksk6mq881u7og4t7xyr0c076 2025-09-07T09:32:49.4410593Z 2025-09-07T09:32:49.4410713Z Total reclaimed space: 61.39GB 2025-09-07T09:32:49.4497744Z Post job cleanup. 2025-09-07T09:32:49.4552098Z Post job cleanup. 2025-09-07T09:32:49.5571122Z [command]/usr/bin/git version 2025-09-07T09:32:49.5615060Z git version 2.47.1 2025-09-07T09:32:49.5652846Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/15876bc6-4aa1-49de-9255-88e4809cd1f1/.gitconfig' 2025-09-07T09:32:49.5662447Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/15876bc6-4aa1-49de-9255-88e4809cd1f1' before making global git config changes 2025-09-07T09:32:49.5663541Z Adding repository directory to the temporary git global config as a safe directory 2025-09-07T09:32:49.5667733Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-09-07T09:32:49.5718241Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-09-07T09:32:49.5751042Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-09-07T09:32:49.6128719Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-09-07T09:32:49.6151072Z http.https://github.com/.extraheader 2025-09-07T09:32:49.6159855Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-09-07T09:32:49.6191089Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-09-07T09:32:49.6610678Z A job completed hook has been configured by the self-hosted runner administrator 2025-09-07T09:32:49.6637711Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh' 2025-09-07T09:32:49.6643514Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-09-07T09:32:49.6643934Z ##[endgroup] 2025-09-07T09:32:49.6740876Z [!ALERT!] Swap in detected! [!ALERT!] 2025-09-07T09:33:01.1148250Z [!ALERT!] Swap out detected [!ALERT!] 2025-09-07T09:33:19.9490910Z Cleaning up orphan processes